2025-09-07T06:12:37.7062507Z Current runner version: '2.328.0' 2025-09-07T06:12:37.7069390Z Runner name: 'i-04d2b1bca56e299e2' 2025-09-07T06:12:37.7070314Z Runner group name: 'default' 2025-09-07T06:12:37.7071175Z Machine name: 'ip-10-0-63-32' 2025-09-07T06:12:37.7074332Z ##[group]GITHUB_TOKEN Permissions 2025-09-07T06:12:37.7076678Z Contents: read 2025-09-07T06:12:37.7077242Z Metadata: read 2025-09-07T06:12:37.7078016Z Packages: read 2025-09-07T06:12:37.7078585Z ##[endgroup] 2025-09-07T06:12:37.7080899Z Secret source: Actions 2025-09-07T06:12:37.7081980Z Prepare workflow directory 2025-09-07T06:12:37.7648266Z Prepare all required actions 2025-09-07T06:12:37.7693499Z Getting action download info 2025-09-07T06:12:37.9963128Z Download action repository 'pytorch/test-infra@main' (SHA:548a4bc624d43a01cdf165a63b041f0ae014ddbd) 2025-09-07T06:12:39.9463211Z Download action repository 'pytorch/pytorch@main' (SHA:93fb23d6fae7c4e82c4239a1033e522088742634) 2025-09-07T06:12:54.2075235Z Download action repository 'actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874' (SHA:50769540e7f4bd5e21e526ee35c689e35e0d6874) 2025-09-07T06:12:54.5908936Z Getting action download info 2025-09-07T06:12:54.7314524Z Download action repository 'actions/checkout@v4' (SHA:08eba0b27e820071cde6df949e0beb9ba4906955) 2025-09-07T06:12:54.9983761Z Complete job name: Build cu129 vLLM wheel 2025-09-07T06:12:55.0691942Z A job started hook has been configured by the self-hosted runner administrator 2025-09-07T06:12:55.0813920Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-09-07T06:12:55.0824996Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:12:55.0825773Z ##[endgroup] 2025-09-07T06:12:56.5561947Z Runner Type: linux.12xlarge.memory 2025-09-07T06:12:56.5562522Z Instance Type: r5.12xlarge 2025-09-07T06:12:56.5562821Z AMI Name: unknown 2025-09-07T06:12:56.5594018Z AMI ID: ami-05ffe3c48a9991133 2025-09-07T06:13:02.2841257Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2025-09-07T06:13:02.2841737Z with: 2025-09-07T06:13:02.2842342Z github-secret: *** 2025-09-07T06:13:02.2842622Z activate-with-label: false 2025-09-07T06:13:02.2842915Z label: with-ssh 2025-09-07T06:13:02.2843156Z remove-existing-keys: true 2025-09-07T06:13:02.2843443Z fail-silently: true 2025-09-07T06:13:02.2843682Z env: 2025-09-07T06:13:02.2843901Z PY_VERS: 3.12 2025-09-07T06:13:02.2844203Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:02.2844603Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:02.2844896Z BUILD_DEVICE: cu129 2025-09-07T06:13:02.2845144Z ##[endgroup] 2025-09-07T06:13:02.4088800Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info. 2025-09-07T06:13:02.4089945Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2025-09-07T06:13:02.4332451Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-09-07T06:13:02.4333061Z with: 2025-09-07T06:13:02.4333346Z submodules: false 2025-09-07T06:13:02.4333631Z fetch-depth: 0 2025-09-07T06:13:02.4333896Z env: 2025-09-07T06:13:02.4334119Z PY_VERS: 3.12 2025-09-07T06:13:02.4334479Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:02.4334911Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:02.4335244Z BUILD_DEVICE: cu129 2025-09-07T06:13:02.4335504Z ##[endgroup] 2025-09-07T06:13:02.4429120Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:13:02.4430305Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:13:02.4441305Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:02.4441727Z env: 2025-09-07T06:13:02.4442011Z PY_VERS: 3.12 2025-09-07T06:13:02.4442339Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:02.4442768Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:02.4443246Z BUILD_DEVICE: cu129 2025-09-07T06:13:02.4443516Z ##[endgroup] 2025-09-07T06:13:02.4577987Z ##[group]Run # Use all available CPUs for fetching 2025-09-07T06:13:02.4578509Z # Use all available CPUs for fetching 2025-09-07T06:13:02.4578901Z cd "${GITHUB_WORKSPACE}" 2025-09-07T06:13:02.4579294Z git config --global fetch.parallel 0 2025-09-07T06:13:02.4579749Z git config --global submodule.fetchJobs 0 2025-09-07T06:13:02.4580137Z  2025-09-07T06:13:02.4580562Z # Clean workspace. The default checkout action should also do this, but 2025-09-07T06:13:02.4581098Z # do it here as well just in case 2025-09-07T06:13:02.4581470Z if [[ -d .git ]]; then 2025-09-07T06:13:02.4581797Z  if [ -z "${NO_SUDO}" ]; then 2025-09-07T06:13:02.4582173Z  sudo git clean -ffdx 2025-09-07T06:13:02.4582486Z  else 2025-09-07T06:13:02.4582756Z  git clean -ffdx 2025-09-07T06:13:02.4583067Z  fi 2025-09-07T06:13:02.4583320Z fi 2025-09-07T06:13:02.4589171Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:02.4589552Z env: 2025-09-07T06:13:02.4589769Z PY_VERS: 3.12 2025-09-07T06:13:02.4590072Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:02.4590470Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:02.4590759Z BUILD_DEVICE: cu129 2025-09-07T06:13:02.4591008Z NO_SUDO: 2025-09-07T06:13:02.4591214Z ##[endgroup] 2025-09-07T06:13:02.4811959Z ##[group]Run actions/checkout@v4 2025-09-07T06:13:02.4812491Z with: 2025-09-07T06:13:02.4813028Z ref: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:13:02.4813530Z fetch-depth: 0 2025-09-07T06:13:02.4813836Z submodules: false 2025-09-07T06:13:02.4814306Z show-progress: false 2025-09-07T06:13:02.4814672Z repository: pytorch/pytorch 2025-09-07T06:13:02.4815259Z token: *** 2025-09-07T06:13:02.4815689Z ssh-strict: true 2025-09-07T06:13:02.4816068Z ssh-user: git 2025-09-07T06:13:02.4816415Z persist-credentials: true 2025-09-07T06:13:02.4816912Z clean: true 2025-09-07T06:13:02.4817297Z sparse-checkout-cone-mode: true 2025-09-07T06:13:02.4817701Z fetch-tags: false 2025-09-07T06:13:02.4818123Z lfs: false 2025-09-07T06:13:02.4818448Z set-safe-directory: true 2025-09-07T06:13:02.4818848Z env: 2025-09-07T06:13:02.4819168Z PY_VERS: 3.12 2025-09-07T06:13:02.4819646Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:02.4820195Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:02.4820611Z BUILD_DEVICE: cu129 2025-09-07T06:13:02.4821057Z ##[endgroup] 2025-09-07T06:13:02.6032507Z Syncing repository: pytorch/pytorch 2025-09-07T06:13:02.6034213Z ##[group]Getting Git version info 2025-09-07T06:13:02.6035281Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-09-07T06:13:02.6036242Z [command]/usr/bin/git version 2025-09-07T06:13:02.6036626Z git version 2.47.1 2025-09-07T06:13:02.6046790Z ##[endgroup] 2025-09-07T06:13:02.6056524Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/686741a3-f594-4f89-946d-bbfa8861caa8/.gitconfig' 2025-09-07T06:13:02.6077103Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/686741a3-f594-4f89-946d-bbfa8861caa8' before making global git config changes 2025-09-07T06:13:02.6078529Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T06:13:02.6081911Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-09-07T06:13:02.6121776Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-09-07T06:13:02.6125254Z ##[group]Initializing the repository 2025-09-07T06:13:02.6129817Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-09-07T06:13:02.6189218Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-09-07T06:13:02.6190068Z hint: is subject to change. To configure the initial branch name to use in all 2025-09-07T06:13:02.6191101Z hint: of your new repositories, which will suppress this warning, call: 2025-09-07T06:13:02.6191838Z hint: 2025-09-07T06:13:02.6192279Z hint: git config --global init.defaultBranch 2025-09-07T06:13:02.6192774Z hint: 2025-09-07T06:13:02.6193325Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-09-07T06:13:02.6194129Z hint: 'development'. The just-created branch can be renamed via this command: 2025-09-07T06:13:02.6194692Z hint: 2025-09-07T06:13:02.6241015Z hint: git branch -m 2025-09-07T06:13:02.6241889Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2025-09-07T06:13:02.6243394Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2025-09-07T06:13:02.6284013Z ##[endgroup] 2025-09-07T06:13:02.6284547Z ##[group]Disabling automatic garbage collection 2025-09-07T06:13:02.6285812Z [command]/usr/bin/git config --local gc.auto 0 2025-09-07T06:13:02.6325146Z ##[endgroup] 2025-09-07T06:13:02.6325625Z ##[group]Setting up auth 2025-09-07T06:13:02.6330354Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T06:13:02.6360025Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T06:13:02.6727156Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T06:13:02.6754175Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T06:13:02.7064347Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T06:13:02.7114877Z ##[endgroup] 2025-09-07T06:13:02.7115414Z ##[group]Fetching the repository 2025-09-07T06:13:02.7121539Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-09-07T06:13:52.6683300Z From https://github.com/pytorch/pytorch 2025-09-07T06:13:52.6683878Z * [new branch] 160583 -> origin/160583 2025-09-07T06:13:52.6684581Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-09-07T06:13:52.6685539Z * [new branch] 5addvllmbuild -> origin/5addvllmbuild 2025-09-07T06:13:52.6686249Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-09-07T06:13:52.6687352Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-09-07T06:13:52.6688041Z * [new branch] ISSUE-154849 -> origin/ISSUE-154849 2025-09-07T06:13:52.6688779Z * [new branch] JackCaoG/dynamo_make_fx_non_core_aten_ops -> origin/JackCaoG/dynamo_make_fx_non_core_aten_ops 2025-09-07T06:13:52.6689558Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-09-07T06:13:52.6690195Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-09-07T06:13:52.6691512Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-09-07T06:13:52.6692927Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-09-07T06:13:52.6694031Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-09-07T06:13:52.6695206Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-09-07T06:13:52.6697239Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-09-07T06:13:52.6698291Z * [new branch] VLA_exp -> origin/VLA_exp 2025-09-07T06:13:52.6699814Z * [new branch] actually-run-mps-aot-inductor -> origin/actually-run-mps-aot-inductor 2025-09-07T06:13:52.6701064Z * [new branch] add-missing-args-normalization -> origin/add-missing-args-normalization 2025-09-07T06:13:52.6702258Z * [new branch] add-user-guide-structure -> origin/add-user-guide-structure 2025-09-07T06:13:52.6703762Z * [new branch] add-vllm-nightly-build -> origin/add-vllm-nightly-build 2025-09-07T06:13:52.6704917Z * [new branch] add_compile_benchmarking -> origin/add_compile_benchmarking 2025-09-07T06:13:52.6706109Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-09-07T06:13:52.6707306Z * [new branch] addsimde -> origin/addsimde 2025-09-07T06:13:52.6708519Z * [new branch] addvllmtest -> origin/addvllmtest 2025-09-07T06:13:52.6710272Z * [new branch] adi/acl_upgrade -> origin/adi/acl_upgrade 2025-09-07T06:13:52.6711386Z * [new branch] adi/test -> origin/adi/test 2025-09-07T06:13:52.6712550Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-09-07T06:13:52.6713689Z * [new branch] adi/test_fusions -> origin/adi/test_fusions 2025-09-07T06:13:52.6714822Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-09-07T06:13:52.6716242Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-09-07T06:13:52.6717119Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-09-07T06:13:52.6718735Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-09-07T06:13:52.6720863Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-09-07T06:13:52.6722340Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-09-07T06:13:52.6723250Z * [new branch] alt-disable -> origin/alt-disable 2025-09-07T06:13:52.6725075Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-09-07T06:13:52.6726142Z * [new branch] angelayi/aoti_inductor_fx -> origin/angelayi/aoti_inductor_fx 2025-09-07T06:13:52.6727297Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-09-07T06:13:52.6728538Z * [new branch] angelayi/benchmark2 -> origin/angelayi/benchmark2 2025-09-07T06:13:52.6729761Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-09-07T06:13:52.6730903Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-09-07T06:13:52.6732739Z * [new branch] angelayi/custom_op_subgraph -> origin/angelayi/custom_op_subgraph 2025-09-07T06:13:52.6734082Z * [new branch] angelayi/customop -> origin/angelayi/customop 2025-09-07T06:13:52.6735694Z * [new branch] angelayi/fake_cache_empty -> origin/angelayi/fake_cache_empty 2025-09-07T06:13:52.6736948Z * [new branch] angelayi/is_symbolic_tracing -> origin/angelayi/is_symbolic_tracing 2025-09-07T06:13:52.6738049Z * [new branch] angelayi/item -> origin/angelayi/item 2025-09-07T06:13:52.6739288Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-09-07T06:13:52.6740450Z * [new branch] angelayi/opoverload -> origin/angelayi/opoverload 2025-09-07T06:13:52.6741654Z * [new branch] angelayi/pattern -> origin/angelayi/pattern 2025-09-07T06:13:52.6742889Z * [new branch] angelayi/pytree -> origin/angelayi/pytree 2025-09-07T06:13:52.6744223Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-09-07T06:13:52.6745484Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-09-07T06:13:52.6746673Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-09-07T06:13:52.6747783Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-09-07T06:13:52.6749362Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-09-07T06:13:52.6751064Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-09-07T06:13:52.6752240Z * [new branch] aoti_weight_sharing -> origin/aoti_weight_sharing 2025-09-07T06:13:52.6753717Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-09-07T06:13:52.6754872Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-09-07T06:13:52.6756064Z * [new branch] atalman-patch-1 -> origin/atalman-patch-1 2025-09-07T06:13:52.6757460Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-09-07T06:13:52.6758611Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-09-07T06:13:52.6760005Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-09-07T06:13:52.6761255Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-09-07T06:13:52.6762621Z * [new branch] atalman_inductor_2.3.0 -> origin/atalman_inductor_2.3.0 2025-09-07T06:13:52.6764067Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-09-07T06:13:52.6765146Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-09-07T06:13:52.6766397Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-09-07T06:13:52.6767735Z * [new branch] autoupdate-transformers-pin-via-pr -> origin/autoupdate-transformers-pin-via-pr 2025-09-07T06:13:52.6769253Z * [new branch] bahuang/dtensor_demo -> origin/bahuang/dtensor_demo 2025-09-07T06:13:52.6770317Z * [new branch] bahuang/test -> origin/bahuang/test 2025-09-07T06:13:52.6772560Z * [new branch] base/1.5 -> origin/base/1.5 2025-09-07T06:13:52.6773936Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-09-07T06:13:52.6775864Z * [new branch] bc-lint-config -> origin/bc-lint-config 2025-09-07T06:13:52.6776510Z * [new branch] bc-lint-test-new-config -> origin/bc-lint-test-new-config 2025-09-07T06:13:52.6777749Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-09-07T06:13:52.6778911Z * [new branch] benchmarker_compat_with_do_bench -> origin/benchmarker_compat_with_do_bench 2025-09-07T06:13:52.6780041Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-09-07T06:13:52.6781792Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-09-07T06:13:52.6783421Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-09-07T06:13:52.6785090Z * [new branch] bf/cg-custom-wrapper -> origin/bf/cg-custom-wrapper 2025-09-07T06:13:52.6786145Z * [new branch] bf/cg-or-error -> origin/bf/cg-or-error 2025-09-07T06:13:52.6787226Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-09-07T06:13:52.6788436Z * [new branch] bf/cg-skip-1-kernel -> origin/bf/cg-skip-1-kernel 2025-09-07T06:13:52.6789527Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-09-07T06:13:52.6791091Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-09-07T06:13:52.6792847Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-09-07T06:13:52.6793897Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-09-07T06:13:52.6795041Z * [new branch] bf/default-recompile-reason -> origin/bf/default-recompile-reason 2025-09-07T06:13:52.6796225Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-09-07T06:13:52.6797336Z * [new branch] bf/exp -> origin/bf/exp 2025-09-07T06:13:52.6798502Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-09-07T06:13:52.6799692Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-09-07T06:13:52.6800858Z * [new branch] bf/partition-turn-on -> origin/bf/partition-turn-on 2025-09-07T06:13:52.6801981Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-09-07T06:13:52.6803000Z * [new branch] bf/rope -> origin/bf/rope 2025-09-07T06:13:52.6804265Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-09-07T06:13:52.6805422Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-09-07T06:13:52.6806501Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-09-07T06:13:52.6807593Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-09-07T06:13:52.6808685Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-09-07T06:13:52.6809773Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-09-07T06:13:52.6810855Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-09-07T06:13:52.6812341Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-09-07T06:13:52.6813495Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-09-07T06:13:52.6814689Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-09-07T06:13:52.6815787Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-09-07T06:13:52.6816900Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-09-07T06:13:52.6818158Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-09-07T06:13:52.6819210Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-09-07T06:13:52.6820322Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-09-07T06:13:52.6821506Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-09-07T06:13:52.6823328Z * [new branch] bowbao/bench_updates_stage -> origin/bowbao/bench_updates_stage 2025-09-07T06:13:52.6824467Z * [new branch] bowbao/dort_rewriter -> origin/bowbao/dort_rewriter 2025-09-07T06:13:52.6825551Z * [new branch] bowbao/wip_prs -> origin/bowbao/wip_prs 2025-09-07T06:13:52.6827612Z * [new branch] brister/break_tensorbox -> origin/brister/break_tensorbox 2025-09-07T06:13:52.6828789Z * [new branch] brister/custom_fx_backend -> origin/brister/custom_fx_backend 2025-09-07T06:13:52.6829923Z * [new branch] brister/fx_custom_triton -> origin/brister/fx_custom_triton 2025-09-07T06:13:52.6830986Z * [new branch] brister/tensor_box_output -> origin/brister/tensor_box_output 2025-09-07T06:13:52.6832207Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-09-07T06:13:52.6833295Z * [new branch] c57382a49 -> origin/c57382a49 2025-09-07T06:13:52.6834435Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-09-07T06:13:52.6835563Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-09-07T06:13:52.6837922Z * [new branch] camyll/revert-94bc900da97ad7f3c35b3b819bb53b23c74b581a-for-release-2.8 -> origin/camyll/revert-94bc900da97ad7f3c35b3b819bb53b23c74b581a-for-release-2.8 2025-09-07T06:13:52.6839139Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-09-07T06:13:52.6840298Z * [new branch] cherry-pick-149654-by-pytorch_bot_bot_ -> origin/cherry-pick-149654-by-pytorch_bot_bot_ 2025-09-07T06:13:52.6841488Z * [new branch] cherry-pick-151939-by-pytorch_bot_bot_ -> origin/cherry-pick-151939-by-pytorch_bot_bot_ 2025-09-07T06:13:52.6842669Z * [new branch] cherry-pick-154174-by-pytorch_bot_bot_ -> origin/cherry-pick-154174-by-pytorch_bot_bot_ 2025-09-07T06:13:52.6843890Z * [new branch] cherry-pick-156260-by-pytorch_bot_bot_ -> origin/cherry-pick-156260-by-pytorch_bot_bot_ 2025-09-07T06:13:52.6845119Z * [new branch] cherry-pick-157453-by-pytorch_bot_bot_ -> origin/cherry-pick-157453-by-pytorch_bot_bot_ 2025-09-07T06:13:52.6846378Z * [new branch] cherry-pick-157513-by-pytorch_bot_bot_ -> origin/cherry-pick-157513-by-pytorch_bot_bot_ 2025-09-07T06:13:52.6847559Z * [new branch] cherry-pick-157695-by-pytorch_bot_bot_ -> origin/cherry-pick-157695-by-pytorch_bot_bot_ 2025-09-07T06:13:52.6848886Z * [new branch] cherry-pick-157732-by-pytorch_bot_bot_ -> origin/cherry-pick-157732-by-pytorch_bot_bot_ 2025-09-07T06:13:52.6850541Z * [new branch] cherry-pick-158537-by-pytorch_bot_bot_ -> origin/cherry-pick-158537-by-pytorch_bot_bot_ 2025-09-07T06:13:52.6851834Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-09-07T06:13:52.6853139Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-09-07T06:13:52.6854756Z * [new branch] chilli/flex_vllm -> origin/chilli/flex_vllm 2025-09-07T06:13:52.6856076Z * [new branch] cleanup-inductor-benchmark-images -> origin/cleanup-inductor-benchmark-images 2025-09-07T06:13:52.6857096Z * [new branch] codex-testing -> origin/codex-testing 2025-09-07T06:13:52.6859320Z * [new branch] codex/add-helper-function-to-sizevars.py -> origin/codex/add-helper-function-to-sizevars.py 2025-09-07T06:13:52.6860445Z * [new branch] codex/add-helper-function-to-sizevars.py_2025-09-05 -> origin/codex/add-helper-function-to-sizevars.py_2025-09-05 2025-09-07T06:13:52.6861560Z * [new branch] codex/add-metadata-field-for-file-path -> origin/codex/add-metadata-field-for-file-path 2025-09-07T06:13:52.6863195Z * [new branch] codex/add-test-for-inductor-local-cache-behavior -> origin/codex/add-test-for-inductor-local-cache-behavior 2025-09-07T06:13:52.6864701Z * [new branch] codex/create-test-for-tensor-memory-leak-in-cudagraph -> origin/codex/create-test-for-tensor-memory-leak-in-cudagraph 2025-09-07T06:13:52.6865776Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-09-07T06:13:52.6866845Z * [new branch] codex/fix-issue-160415-in-pytorch -> origin/codex/fix-issue-160415-in-pytorch 2025-09-07T06:13:52.6868162Z * [new branch] codex/fix-noqengine-quantized-engine-support -> origin/codex/fix-noqengine-quantized-engine-support 2025-09-07T06:13:52.6869162Z * [new branch] codex/fix-pin_memory-error-handling -> origin/codex/fix-pin_memory-error-handling 2025-09-07T06:13:52.6870278Z * [new branch] codex/propose-fix-for-issue-160332 -> origin/codex/propose-fix-for-issue-160332 2025-09-07T06:13:52.6871640Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-09-07T06:13:52.6873124Z * [new branch] codex/remove-allow-untyped-defs-and-fix-type-errors -> origin/codex/remove-allow-untyped-defs-and-fix-type-errors 2025-09-07T06:13:52.6874282Z * [new branch] compile_fsdp2_disable_stream_and_event -> origin/compile_fsdp2_disable_stream_and_event 2025-09-07T06:13:52.6875060Z * [new branch] context_test -> origin/context_test 2025-09-07T06:13:52.6876526Z * [new branch] copilot/fix-157446 -> origin/copilot/fix-157446 2025-09-07T06:13:52.6877506Z * [new branch] copy_graph -> origin/copy_graph 2025-09-07T06:13:52.6879197Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-09-07T06:13:52.6880762Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-09-07T06:13:52.6881868Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-09-07T06:13:52.6883005Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-09-07T06:13:52.6884096Z * [new branch] csl/disable_flaky_cpp_test -> origin/csl/disable_flaky_cpp_test 2025-09-07T06:13:52.6885166Z * [new branch] csl/disable_periodic_test -> origin/csl/disable_periodic_test 2025-09-07T06:13:52.6886518Z * [new branch] csl/exclude_rocm_viable_strict -> origin/csl/exclude_rocm_viable_strict 2025-09-07T06:13:52.6887971Z * [new branch] csl/katex -> origin/csl/katex 2025-09-07T06:13:52.6889148Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-09-07T06:13:52.6890291Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-09-07T06:13:52.6891496Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-09-07T06:13:52.6893263Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-09-07T06:13:52.6894405Z * [new branch] csl/name_link_check_job -> origin/csl/name_link_check_job 2025-09-07T06:13:52.6895550Z * [new branch] csl/no_keep_goin_rocm -> origin/csl/no_keep_goin_rocm 2025-09-07T06:13:52.6896732Z * [new branch] csl/not_600_timeout -> origin/csl/not_600_timeout 2025-09-07T06:13:52.6897996Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-09-07T06:13:52.6899072Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-09-07T06:13:52.6900390Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-09-07T06:13:52.6901510Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-09-07T06:13:52.6902729Z * [new branch] cublasltrelax2 -> origin/cublasltrelax2 2025-09-07T06:13:52.6903997Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-09-07T06:13:52.6905169Z * [new branch] cudnnsdparefactor -> origin/cudnnsdparefactor 2025-09-07T06:13:52.6906345Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-09-07T06:13:52.6907403Z * [new branch] czhuge_muon_dev -> origin/czhuge_muon_dev 2025-09-07T06:13:52.6909179Z * [new branch] d4l3k/delete_hook -> origin/d4l3k/delete_hook 2025-09-07T06:13:52.6910202Z * [new branch] dcp_zoc -> origin/dcp_zoc 2025-09-07T06:13:52.6911364Z * [new branch] debug-guard -> origin/debug-guard 2025-09-07T06:13:52.6912571Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-09-07T06:13:52.6916580Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.2 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.2 2025-09-07T06:13:52.6918103Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.3 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.3 2025-09-07T06:13:52.6919621Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.4 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.4 2025-09-07T06:13:52.6921140Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.56.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.56.0 2025-09-07T06:13:52.6922407Z * [new branch] dependabot/pip/dot-ci/docker/protobuf-5.29.5 -> origin/dependabot/pip/dot-ci/docker/protobuf-5.29.5 2025-09-07T06:13:52.6924005Z * [new branch] dependabot/pip/dot-github/requirements/protobuf-5.29.5 -> origin/dependabot/pip/dot-github/requirements/protobuf-5.29.5 2025-09-07T06:13:52.6925297Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-09-07T06:13:52.6926524Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-09-07T06:13:52.6928873Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-09-07T06:13:52.6930115Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-09-07T06:13:52.6931952Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-09-07T06:13:52.6933414Z * [new branch] dev/joona/cat_remove_graph -> origin/dev/joona/cat_remove_graph 2025-09-07T06:13:52.6934561Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-09-07T06:13:52.6936113Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-09-07T06:13:52.6937863Z * [new branch] dev/joona/maxpool2dwithindices_errmsg -> origin/dev/joona/maxpool2dwithindices_errmsg 2025-09-07T06:13:52.6939497Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-09-07T06:13:52.6941174Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-09-07T06:13:52.6942884Z * [new branch] dev/joona/topk_newapi -> origin/dev/joona/topk_newapi 2025-09-07T06:13:52.6944359Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-09-07T06:13:52.6945603Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-09-07T06:13:52.6946859Z * [new branch] disable -> origin/disable 2025-09-07T06:13:52.6947977Z * [new branch] e2e-baseline -> origin/e2e-baseline 2025-09-07T06:13:52.6949387Z * [new branch] eigen_for_sparse_addmm_v2 -> origin/eigen_for_sparse_addmm_v2 2025-09-07T06:13:52.6951378Z * [new branch] embg/test_inductor_ci_128B -> origin/embg/test_inductor_ci_128B 2025-09-07T06:13:52.6952497Z * [new branch] embg/test_inductor_ci_base -> origin/embg/test_inductor_ci_base 2025-09-07T06:13:52.6953782Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-09-07T06:13:52.6954895Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-09-07T06:13:52.6956188Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-09-07T06:13:52.6957505Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-09-07T06:13:52.6958704Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-09-07T06:13:52.6959803Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-09-07T06:13:52.6960937Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-09-07T06:13:52.6962391Z * [new branch] example-convert-torch.nn -> origin/example-convert-torch.nn 2025-09-07T06:13:52.6964257Z * [new branch] exclamaforte/add-contiguous-threshold -> origin/exclamaforte/add-contiguous-threshold 2025-09-07T06:13:52.6965426Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-09-07T06:13:52.6966826Z * [new branch] exclamaforte/bump-transformer-version -> origin/exclamaforte/bump-transformer-version 2025-09-07T06:13:52.6967858Z * [new branch] exclamaforte/clear-feedback-savers -> origin/exclamaforte/clear-feedback-savers 2025-09-07T06:13:52.6968995Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-09-07T06:13:52.6970385Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-09-07T06:13:52.6972184Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-09-07T06:13:52.6973548Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-09-07T06:13:52.6974928Z * [new branch] exclamaforte/fix-exhuastive-autotuning-reland -> origin/exclamaforte/fix-exhuastive-autotuning-reland 2025-09-07T06:13:52.6976493Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-09-07T06:13:52.6977747Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-09-07T06:13:52.6978817Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-09-07T06:13:52.6979734Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-09-07T06:13:52.6980776Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-09-07T06:13:52.6981825Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-09-07T06:13:52.6983299Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-09-07T06:13:52.6984390Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-09-07T06:13:52.6985625Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-09-07T06:13:52.6986926Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-09-07T06:13:52.6987958Z * [new branch] exclamaforte/max-autotune-ieee -> origin/exclamaforte/max-autotune-ieee 2025-09-07T06:13:52.6988984Z * [new branch] exclamaforte/memory-counter -> origin/exclamaforte/memory-counter 2025-09-07T06:13:52.6989961Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-09-07T06:13:52.6991196Z * [new branch] exclamaforte/profiler-combo -> origin/exclamaforte/profiler-combo 2025-09-07T06:13:52.6992303Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-09-07T06:13:52.6993504Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-09-07T06:13:52.6994758Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-09-07T06:13:52.6996189Z * [new branch] exclamforte/gemm-model-final -> origin/exclamforte/gemm-model-final 2025-09-07T06:13:52.6997867Z * [new branch] exec -> origin/exec 2025-09-07T06:13:52.6999028Z * [new branch] executorch-module-shim -> origin/executorch-module-shim 2025-09-07T06:13:52.7000304Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-09-07T06:13:52.7001511Z * [new branch] export-D58091437 -> origin/export-D58091437 2025-09-07T06:13:52.7002759Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-09-07T06:13:52.7003869Z * [new branch] export-D70112642 -> origin/export-D70112642 2025-09-07T06:13:52.7005198Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-09-07T06:13:52.7006666Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-09-07T06:13:52.7007738Z * [new branch] export-D75183591 -> origin/export-D75183591 2025-09-07T06:13:52.7008945Z * [new branch] export-D75617432 -> origin/export-D75617432 2025-09-07T06:13:52.7010120Z * [new branch] export-D75659965 -> origin/export-D75659965 2025-09-07T06:13:52.7011314Z * [new branch] export-D76080931 -> origin/export-D76080931 2025-09-07T06:13:52.7012693Z * [new branch] export-D76797250 -> origin/export-D76797250 2025-09-07T06:13:52.7013866Z * [new branch] export-D76885271 -> origin/export-D76885271 2025-09-07T06:13:52.7015046Z * [new branch] export-D76885620 -> origin/export-D76885620 2025-09-07T06:13:52.7016242Z * [new branch] export-D76936623 -> origin/export-D76936623 2025-09-07T06:13:52.7017508Z * [new branch] export-D76958268 -> origin/export-D76958268 2025-09-07T06:13:52.7018675Z * [new branch] export-D78375400 -> origin/export-D78375400 2025-09-07T06:13:52.7019820Z * [new branch] export-D78431305 -> origin/export-D78431305 2025-09-07T06:13:52.7021132Z * [new branch] export-D78580107 -> origin/export-D78580107 2025-09-07T06:13:52.7022264Z * [new branch] export-D78822171 -> origin/export-D78822171 2025-09-07T06:13:52.7023489Z * [new branch] export-D78822351 -> origin/export-D78822351 2025-09-07T06:13:52.7024774Z * [new branch] export-D78822507 -> origin/export-D78822507 2025-09-07T06:13:52.7025779Z * [new branch] export-D78826994 -> origin/export-D78826994 2025-09-07T06:13:52.7026889Z * [new branch] export-D78894324 -> origin/export-D78894324 2025-09-07T06:13:52.7028524Z * [new branch] export-D78929245 -> origin/export-D78929245 2025-09-07T06:13:52.7029414Z * [new branch] export-D78934925 -> origin/export-D78934925 2025-09-07T06:13:52.7030649Z * [new branch] export-D78953203 -> origin/export-D78953203 2025-09-07T06:13:52.7031881Z * [new branch] export-D78953229 -> origin/export-D78953229 2025-09-07T06:13:52.7032834Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-09-07T06:13:52.7033910Z * [new branch] export-D78957389 -> origin/export-D78957389 2025-09-07T06:13:52.7035058Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-09-07T06:13:52.7036219Z * [new branch] export-D79026433 -> origin/export-D79026433 2025-09-07T06:13:52.7037450Z * [new branch] export-D79230339 -> origin/export-D79230339 2025-09-07T06:13:52.7038620Z * [new branch] export-D79319835 -> origin/export-D79319835 2025-09-07T06:13:52.7039692Z * [new branch] export-D79328456 -> origin/export-D79328456 2025-09-07T06:13:52.7040928Z * [new branch] export-D79534608 -> origin/export-D79534608 2025-09-07T06:13:52.7042399Z * [new branch] export-D79785974 -> origin/export-D79785974 2025-09-07T06:13:52.7043507Z * [new branch] export-D80025417 -> origin/export-D80025417 2025-09-07T06:13:52.7044714Z * [new branch] export-D80120333 -> origin/export-D80120333 2025-09-07T06:13:52.7045977Z * [new branch] export-D80214882 -> origin/export-D80214882 2025-09-07T06:13:52.7047605Z * [new branch] export-D80319069 -> origin/export-D80319069 2025-09-07T06:13:52.7049076Z * [new branch] export-D80321215 -> origin/export-D80321215 2025-09-07T06:13:52.7050472Z * [new branch] export-D80503451 -> origin/export-D80503451 2025-09-07T06:13:52.7051621Z * [new branch] export-D80771648 -> origin/export-D80771648 2025-09-07T06:13:52.7052836Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-09-07T06:13:52.7054073Z * [new branch] export-D80948073 -> origin/export-D80948073 2025-09-07T06:13:52.7055432Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-09-07T06:13:52.7056593Z * [new branch] export-D80970483 -> origin/export-D80970483 2025-09-07T06:13:52.7057790Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-09-07T06:13:52.7058942Z * [new branch] export-D81060182 -> origin/export-D81060182 2025-09-07T06:13:52.7060156Z * [new branch] export-D81078973 -> origin/export-D81078973 2025-09-07T06:13:52.7061365Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-09-07T06:13:52.7062551Z * [new branch] export-D81284190 -> origin/export-D81284190 2025-09-07T06:13:52.7063892Z * [new branch] export-D81299840 -> origin/export-D81299840 2025-09-07T06:13:52.7065128Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-09-07T06:13:52.7066191Z * [new branch] export-D81698719 -> origin/export-D81698719 2025-09-07T06:13:52.7067329Z * [new branch] export-D81747409 -> origin/export-D81747409 2025-09-07T06:13:52.7068778Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-09-07T06:13:52.7070343Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-09-07T06:13:52.7071327Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-09-07T06:13:52.7072788Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-09-07T06:13:52.7074399Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-09-07T06:13:52.7075686Z * [new branch] fca -> origin/fca 2025-09-07T06:13:52.7076762Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-09-07T06:13:52.7077907Z * [new branch] fca5 -> origin/fca5 2025-09-07T06:13:52.7079699Z * [new branch] feature/function-numa-binding -> origin/feature/function-numa-binding 2025-09-07T06:13:52.7080826Z * [new branch] feature/function-numa-binding-take2 -> origin/feature/function-numa-binding-take2 2025-09-07T06:13:52.7081769Z * [new branch] feature/numa-nproc-fix -> origin/feature/numa-nproc-fix 2025-09-07T06:13:52.7082943Z * [new branch] feature/numa-signpost-serialize -> origin/feature/numa-signpost-serialize 2025-09-07T06:13:52.7083995Z * [new branch] feature/parallel-numa-binding -> origin/feature/parallel-numa-binding 2025-09-07T06:13:52.7085656Z * [new branch] fengyuan/external-proj -> origin/fengyuan/external-proj 2025-09-07T06:13:52.7086888Z * [new branch] fengyuan/out-of-tree-xpu-ops-improve-test -> origin/fengyuan/out-of-tree-xpu-ops-improve-test 2025-09-07T06:13:52.7087984Z * [new branch] fengyuan/out-of-tree-xpu-ops-remove-dtype -> origin/fengyuan/out-of-tree-xpu-ops-remove-dtype 2025-09-07T06:13:52.7088835Z * [new branch] fengyuan/test-xpu -> origin/fengyuan/test-xpu 2025-09-07T06:13:52.7091003Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-09-07T06:13:52.7092728Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-09-07T06:13:52.7094440Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-09-07T06:13:52.7095587Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-09-07T06:13:52.7096670Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-09-07T06:13:52.7097759Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-09-07T06:13:52.7098941Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-09-07T06:13:52.7100090Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-09-07T06:13:52.7101211Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-09-07T06:13:52.7102377Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-09-07T06:13:52.7103583Z * [new branch] fix -> origin/fix 2025-09-07T06:13:52.7104971Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-09-07T06:13:52.7106033Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-09-07T06:13:52.7107179Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-09-07T06:13:52.7108481Z * [new branch] fix-inductor-periodic-0528 -> origin/fix-inductor-periodic-0528 2025-09-07T06:13:52.7109499Z * [new branch] fix-mps-benchmark -> origin/fix-mps-benchmark 2025-09-07T06:13:52.7110703Z * [new branch] fix-rlease-feature-template -> origin/fix-rlease-feature-template 2025-09-07T06:13:52.7111928Z * [new branch] fix-run-condition-upload-results -> origin/fix-run-condition-upload-results 2025-09-07T06:13:52.7112917Z * [new branch] fix-torchbench -> origin/fix-torchbench 2025-09-07T06:13:52.7114026Z * [new branch] fix_153389 -> origin/fix_153389 2025-09-07T06:13:52.7115331Z * [new branch] fix_fsdp_rs_bucket2 -> origin/fix_fsdp_rs_bucket2 2025-09-07T06:13:52.7116416Z * [new branch] fix_inductor_peridic_tests -> origin/fix_inductor_peridic_tests 2025-09-07T06:13:52.7117488Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-09-07T06:13:52.7118691Z * [new branch] fixes-triage -> origin/fixes-triage 2025-09-07T06:13:52.7119806Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-09-07T06:13:52.7120940Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-09-07T06:13:52.7122059Z * [new branch] flex-flash -> origin/flex-flash 2025-09-07T06:13:52.7123219Z * [new branch] flex-lowering -> origin/flex-lowering 2025-09-07T06:13:52.7124338Z * [new branch] flex-warning -> origin/flex-warning 2025-09-07T06:13:52.7125576Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-09-07T06:13:52.7127058Z * [new branch] flex_flash -> origin/flex_flash 2025-09-07T06:13:52.7128232Z * [new branch] flexdecode-gqa-groups -> origin/flexdecode-gqa-groups 2025-09-07T06:13:52.7130021Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-09-07T06:13:52.7131094Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-09-07T06:13:52.7132679Z * [new branch] fsdpv2_3d -> origin/fsdpv2_3d 2025-09-07T06:13:52.7134031Z * [new branch] fsdpv2_3d_m1 -> origin/fsdpv2_3d_m1 2025-09-07T06:13:52.7135209Z * [new branch] fx_cpp -> origin/fx_cpp 2025-09-07T06:13:52.7136915Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-09-07T06:13:52.7139892Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-09-07T06:13:52.7141005Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-09-07T06:13:52.7142996Z * [new branch] gh/CaoE/2/base -> origin/gh/CaoE/2/base 2025-09-07T06:13:52.7144181Z * [new branch] gh/CaoE/2/head -> origin/gh/CaoE/2/head 2025-09-07T06:13:52.7145316Z * [new branch] gh/CaoE/2/orig -> origin/gh/CaoE/2/orig 2025-09-07T06:13:52.7147427Z * [new branch] gh/ColinPeppler/79/base -> origin/gh/ColinPeppler/79/base 2025-09-07T06:13:52.7148575Z * [new branch] gh/ColinPeppler/79/head -> origin/gh/ColinPeppler/79/head 2025-09-07T06:13:52.7150400Z * [new branch] gh/ColinPeppler/79/orig -> origin/gh/ColinPeppler/79/orig 2025-09-07T06:13:52.7152358Z * [new branch] gh/ColinPeppler/80/base -> origin/gh/ColinPeppler/80/base 2025-09-07T06:13:52.7153543Z * [new branch] gh/ColinPeppler/80/head -> origin/gh/ColinPeppler/80/head 2025-09-07T06:13:52.7154736Z * [new branch] gh/ColinPeppler/80/orig -> origin/gh/ColinPeppler/80/orig 2025-09-07T06:13:52.7156969Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-09-07T06:13:52.7158038Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-09-07T06:13:52.7159928Z * [new branch] gh/EikanWang/80/base -> origin/gh/EikanWang/80/base 2025-09-07T06:13:52.7161088Z * [new branch] gh/EikanWang/80/head -> origin/gh/EikanWang/80/head 2025-09-07T06:13:52.7162326Z * [new branch] gh/EikanWang/80/orig -> origin/gh/EikanWang/80/orig 2025-09-07T06:13:52.7163994Z * [new branch] gh/EikanWang/81/base -> origin/gh/EikanWang/81/base 2025-09-07T06:13:52.7165059Z * [new branch] gh/EikanWang/81/head -> origin/gh/EikanWang/81/head 2025-09-07T06:13:52.7166371Z * [new branch] gh/EikanWang/81/orig -> origin/gh/EikanWang/81/orig 2025-09-07T06:13:52.7167880Z * [new branch] gh/EikanWang/82/base -> origin/gh/EikanWang/82/base 2025-09-07T06:13:52.7168976Z * [new branch] gh/EikanWang/82/head -> origin/gh/EikanWang/82/head 2025-09-07T06:13:52.7170151Z * [new branch] gh/EikanWang/82/orig -> origin/gh/EikanWang/82/orig 2025-09-07T06:13:52.7172970Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-09-07T06:13:52.7174090Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-09-07T06:13:52.7176199Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-09-07T06:13:52.7177305Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-09-07T06:13:52.7178484Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-09-07T06:13:52.7180222Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-09-07T06:13:52.7181400Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-09-07T06:13:52.7182578Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-09-07T06:13:52.7184472Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-09-07T06:13:52.7185525Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-09-07T06:13:52.7186678Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-09-07T06:13:52.7188257Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-09-07T06:13:52.7189293Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-09-07T06:13:52.7190428Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-09-07T06:13:52.7192189Z * [new branch] gh/H-Huang/187/base -> origin/gh/H-Huang/187/base 2025-09-07T06:13:52.7193205Z * [new branch] gh/H-Huang/187/head -> origin/gh/H-Huang/187/head 2025-09-07T06:13:52.7194366Z * [new branch] gh/H-Huang/187/orig -> origin/gh/H-Huang/187/orig 2025-09-07T06:13:52.7195988Z * [new branch] gh/H-Huang/202/base -> origin/gh/H-Huang/202/base 2025-09-07T06:13:52.7197160Z * [new branch] gh/H-Huang/202/head -> origin/gh/H-Huang/202/head 2025-09-07T06:13:52.7198281Z * [new branch] gh/H-Huang/202/orig -> origin/gh/H-Huang/202/orig 2025-09-07T06:13:52.7199998Z * [new branch] gh/H-Huang/203/base -> origin/gh/H-Huang/203/base 2025-09-07T06:13:52.7201081Z * [new branch] gh/H-Huang/203/head -> origin/gh/H-Huang/203/head 2025-09-07T06:13:52.7202184Z * [new branch] gh/H-Huang/203/orig -> origin/gh/H-Huang/203/orig 2025-09-07T06:13:52.7203915Z * [new branch] gh/H-Huang/204/base -> origin/gh/H-Huang/204/base 2025-09-07T06:13:52.7204989Z * [new branch] gh/H-Huang/204/head -> origin/gh/H-Huang/204/head 2025-09-07T06:13:52.7206114Z * [new branch] gh/H-Huang/204/orig -> origin/gh/H-Huang/204/orig 2025-09-07T06:13:52.7207794Z * [new branch] gh/H-Huang/205/base -> origin/gh/H-Huang/205/base 2025-09-07T06:13:52.7208911Z * [new branch] gh/H-Huang/205/head -> origin/gh/H-Huang/205/head 2025-09-07T06:13:52.7209988Z * [new branch] gh/H-Huang/205/orig -> origin/gh/H-Huang/205/orig 2025-09-07T06:13:52.7211887Z * [new branch] gh/H-Huang/206/base -> origin/gh/H-Huang/206/base 2025-09-07T06:13:52.7213120Z * [new branch] gh/H-Huang/206/head -> origin/gh/H-Huang/206/head 2025-09-07T06:13:52.7214384Z * [new branch] gh/H-Huang/206/orig -> origin/gh/H-Huang/206/orig 2025-09-07T06:13:52.7215997Z * [new branch] gh/H-Huang/207/base -> origin/gh/H-Huang/207/base 2025-09-07T06:13:52.7217091Z * [new branch] gh/H-Huang/207/head -> origin/gh/H-Huang/207/head 2025-09-07T06:13:52.7218222Z * [new branch] gh/H-Huang/207/orig -> origin/gh/H-Huang/207/orig 2025-09-07T06:13:52.7219920Z * [new branch] gh/H-Huang/208/base -> origin/gh/H-Huang/208/base 2025-09-07T06:13:52.7221010Z * [new branch] gh/H-Huang/208/head -> origin/gh/H-Huang/208/head 2025-09-07T06:13:52.7222209Z * [new branch] gh/H-Huang/208/orig -> origin/gh/H-Huang/208/orig 2025-09-07T06:13:52.7223855Z * [new branch] gh/H-Huang/209/base -> origin/gh/H-Huang/209/base 2025-09-07T06:13:52.7225046Z * [new branch] gh/H-Huang/209/head -> origin/gh/H-Huang/209/head 2025-09-07T06:13:52.7226197Z * [new branch] gh/H-Huang/209/orig -> origin/gh/H-Huang/209/orig 2025-09-07T06:13:52.7227789Z * [new branch] gh/H-Huang/210/base -> origin/gh/H-Huang/210/base 2025-09-07T06:13:52.7229040Z * [new branch] gh/H-Huang/210/head -> origin/gh/H-Huang/210/head 2025-09-07T06:13:52.7230186Z * [new branch] gh/H-Huang/210/orig -> origin/gh/H-Huang/210/orig 2025-09-07T06:13:52.7231904Z * [new branch] gh/H-Huang/211/base -> origin/gh/H-Huang/211/base 2025-09-07T06:13:52.7232954Z * [new branch] gh/H-Huang/211/head -> origin/gh/H-Huang/211/head 2025-09-07T06:13:52.7234070Z * [new branch] gh/H-Huang/211/orig -> origin/gh/H-Huang/211/orig 2025-09-07T06:13:52.7235736Z * [new branch] gh/H-Huang/212/base -> origin/gh/H-Huang/212/base 2025-09-07T06:13:52.7236798Z * [new branch] gh/H-Huang/212/head -> origin/gh/H-Huang/212/head 2025-09-07T06:13:52.7237910Z * [new branch] gh/H-Huang/212/orig -> origin/gh/H-Huang/212/orig 2025-09-07T06:13:52.7240203Z * [new branch] gh/H-Huang/213/base -> origin/gh/H-Huang/213/base 2025-09-07T06:13:52.7241331Z * [new branch] gh/H-Huang/213/head -> origin/gh/H-Huang/213/head 2025-09-07T06:13:52.7242408Z * [new branch] gh/H-Huang/213/orig -> origin/gh/H-Huang/213/orig 2025-09-07T06:13:52.7244096Z * [new branch] gh/H-Huang/214/base -> origin/gh/H-Huang/214/base 2025-09-07T06:13:52.7245180Z * [new branch] gh/H-Huang/214/head -> origin/gh/H-Huang/214/head 2025-09-07T06:13:52.7246299Z * [new branch] gh/H-Huang/214/orig -> origin/gh/H-Huang/214/orig 2025-09-07T06:13:52.7248370Z * [new branch] gh/IvanKobzarev/112/base -> origin/gh/IvanKobzarev/112/base 2025-09-07T06:13:52.7249918Z * [new branch] gh/IvanKobzarev/112/head -> origin/gh/IvanKobzarev/112/head 2025-09-07T06:13:52.7251107Z * [new branch] gh/IvanKobzarev/112/orig -> origin/gh/IvanKobzarev/112/orig 2025-09-07T06:13:52.7253032Z * [new branch] gh/IvanKobzarev/115/base -> origin/gh/IvanKobzarev/115/base 2025-09-07T06:13:52.7254175Z * [new branch] gh/IvanKobzarev/115/head -> origin/gh/IvanKobzarev/115/head 2025-09-07T06:13:52.7255400Z * [new branch] gh/IvanKobzarev/115/orig -> origin/gh/IvanKobzarev/115/orig 2025-09-07T06:13:52.7257516Z * [new branch] gh/IvanKobzarev/116/base -> origin/gh/IvanKobzarev/116/base 2025-09-07T06:13:52.7258780Z * [new branch] gh/IvanKobzarev/116/head -> origin/gh/IvanKobzarev/116/head 2025-09-07T06:13:52.7259992Z * [new branch] gh/IvanKobzarev/116/orig -> origin/gh/IvanKobzarev/116/orig 2025-09-07T06:13:52.7261829Z * [new branch] gh/IvanKobzarev/118/base -> origin/gh/IvanKobzarev/118/base 2025-09-07T06:13:52.7263222Z * [new branch] gh/IvanKobzarev/118/head -> origin/gh/IvanKobzarev/118/head 2025-09-07T06:13:52.7264258Z * [new branch] gh/IvanKobzarev/118/orig -> origin/gh/IvanKobzarev/118/orig 2025-09-07T06:13:52.7266128Z * [new branch] gh/IvanKobzarev/126/base -> origin/gh/IvanKobzarev/126/base 2025-09-07T06:13:52.7267266Z * [new branch] gh/IvanKobzarev/126/head -> origin/gh/IvanKobzarev/126/head 2025-09-07T06:13:52.7268419Z * [new branch] gh/IvanKobzarev/126/orig -> origin/gh/IvanKobzarev/126/orig 2025-09-07T06:13:52.7270207Z * [new branch] gh/IvanKobzarev/127/base -> origin/gh/IvanKobzarev/127/base 2025-09-07T06:13:52.7271263Z * [new branch] gh/IvanKobzarev/127/head -> origin/gh/IvanKobzarev/127/head 2025-09-07T06:13:52.7272395Z * [new branch] gh/IvanKobzarev/127/orig -> origin/gh/IvanKobzarev/127/orig 2025-09-07T06:13:52.7274113Z * [new branch] gh/IvanKobzarev/128/base -> origin/gh/IvanKobzarev/128/base 2025-09-07T06:13:52.7275202Z * [new branch] gh/IvanKobzarev/128/head -> origin/gh/IvanKobzarev/128/head 2025-09-07T06:13:52.7276335Z * [new branch] gh/IvanKobzarev/128/orig -> origin/gh/IvanKobzarev/128/orig 2025-09-07T06:13:52.7278105Z * [new branch] gh/IvanKobzarev/132/base -> origin/gh/IvanKobzarev/132/base 2025-09-07T06:13:52.7279254Z * [new branch] gh/IvanKobzarev/132/head -> origin/gh/IvanKobzarev/132/head 2025-09-07T06:13:52.7280390Z * [new branch] gh/IvanKobzarev/132/orig -> origin/gh/IvanKobzarev/132/orig 2025-09-07T06:13:52.7282562Z * [new branch] gh/IvanKobzarev/133/base -> origin/gh/IvanKobzarev/133/base 2025-09-07T06:13:52.7283977Z * [new branch] gh/IvanKobzarev/133/head -> origin/gh/IvanKobzarev/133/head 2025-09-07T06:13:52.7285094Z * [new branch] gh/IvanKobzarev/133/orig -> origin/gh/IvanKobzarev/133/orig 2025-09-07T06:13:52.7286730Z * [new branch] gh/IvanKobzarev/134/base -> origin/gh/IvanKobzarev/134/base 2025-09-07T06:13:52.7287817Z * [new branch] gh/IvanKobzarev/134/head -> origin/gh/IvanKobzarev/134/head 2025-09-07T06:13:52.7288907Z * [new branch] gh/IvanKobzarev/134/orig -> origin/gh/IvanKobzarev/134/orig 2025-09-07T06:13:52.7290935Z * [new branch] gh/IvanKobzarev/135/base -> origin/gh/IvanKobzarev/135/base 2025-09-07T06:13:52.7292353Z * [new branch] gh/IvanKobzarev/135/head -> origin/gh/IvanKobzarev/135/head 2025-09-07T06:13:52.7293589Z * [new branch] gh/IvanKobzarev/135/orig -> origin/gh/IvanKobzarev/135/orig 2025-09-07T06:13:52.7295479Z * [new branch] gh/IvanKobzarev/136/base -> origin/gh/IvanKobzarev/136/base 2025-09-07T06:13:52.7296623Z * [new branch] gh/IvanKobzarev/136/head -> origin/gh/IvanKobzarev/136/head 2025-09-07T06:13:52.7297853Z * [new branch] gh/IvanKobzarev/136/orig -> origin/gh/IvanKobzarev/136/orig 2025-09-07T06:13:52.7299306Z * [new branch] gh/IvanKobzarev/137/base -> origin/gh/IvanKobzarev/137/base 2025-09-07T06:13:52.7300609Z * [new branch] gh/IvanKobzarev/137/head -> origin/gh/IvanKobzarev/137/head 2025-09-07T06:13:52.7301779Z * [new branch] gh/IvanKobzarev/137/orig -> origin/gh/IvanKobzarev/137/orig 2025-09-07T06:13:52.7303454Z * [new branch] gh/IvanKobzarev/138/base -> origin/gh/IvanKobzarev/138/base 2025-09-07T06:13:52.7304678Z * [new branch] gh/IvanKobzarev/138/head -> origin/gh/IvanKobzarev/138/head 2025-09-07T06:13:52.7305955Z * [new branch] gh/IvanKobzarev/138/orig -> origin/gh/IvanKobzarev/138/orig 2025-09-07T06:13:52.7307634Z * [new branch] gh/IvanKobzarev/139/base -> origin/gh/IvanKobzarev/139/base 2025-09-07T06:13:52.7308718Z * [new branch] gh/IvanKobzarev/139/head -> origin/gh/IvanKobzarev/139/head 2025-09-07T06:13:52.7309947Z * [new branch] gh/IvanKobzarev/139/orig -> origin/gh/IvanKobzarev/139/orig 2025-09-07T06:13:52.7311738Z * [new branch] gh/IvanKobzarev/140/base -> origin/gh/IvanKobzarev/140/base 2025-09-07T06:13:52.7312784Z * [new branch] gh/IvanKobzarev/140/head -> origin/gh/IvanKobzarev/140/head 2025-09-07T06:13:52.7313969Z * [new branch] gh/IvanKobzarev/140/orig -> origin/gh/IvanKobzarev/140/orig 2025-09-07T06:13:52.7316186Z * [new branch] gh/IvanKobzarev/141/base -> origin/gh/IvanKobzarev/141/base 2025-09-07T06:13:52.7317416Z * [new branch] gh/IvanKobzarev/141/head -> origin/gh/IvanKobzarev/141/head 2025-09-07T06:13:52.7319793Z * [new branch] gh/IvanKobzarev/141/orig -> origin/gh/IvanKobzarev/141/orig 2025-09-07T06:13:52.7321136Z * [new branch] gh/IvanKobzarev/142/base -> origin/gh/IvanKobzarev/142/base 2025-09-07T06:13:52.7321824Z * [new branch] gh/IvanKobzarev/142/head -> origin/gh/IvanKobzarev/142/head 2025-09-07T06:13:52.7322899Z * [new branch] gh/IvanKobzarev/142/orig -> origin/gh/IvanKobzarev/142/orig 2025-09-07T06:13:52.7324723Z * [new branch] gh/IvanKobzarev/143/base -> origin/gh/IvanKobzarev/143/base 2025-09-07T06:13:52.7325872Z * [new branch] gh/IvanKobzarev/143/head -> origin/gh/IvanKobzarev/143/head 2025-09-07T06:13:52.7327051Z * [new branch] gh/IvanKobzarev/143/orig -> origin/gh/IvanKobzarev/143/orig 2025-09-07T06:13:52.7328901Z * [new branch] gh/IvanKobzarev/144/base -> origin/gh/IvanKobzarev/144/base 2025-09-07T06:13:52.7329971Z * [new branch] gh/IvanKobzarev/144/head -> origin/gh/IvanKobzarev/144/head 2025-09-07T06:13:52.7331135Z * [new branch] gh/IvanKobzarev/144/orig -> origin/gh/IvanKobzarev/144/orig 2025-09-07T06:13:52.7333277Z * [new branch] gh/IvanKobzarev/145/base -> origin/gh/IvanKobzarev/145/base 2025-09-07T06:13:52.7334394Z * [new branch] gh/IvanKobzarev/145/head -> origin/gh/IvanKobzarev/145/head 2025-09-07T06:13:52.7335608Z * [new branch] gh/IvanKobzarev/145/orig -> origin/gh/IvanKobzarev/145/orig 2025-09-07T06:13:52.7337381Z * [new branch] gh/IvanKobzarev/146/base -> origin/gh/IvanKobzarev/146/base 2025-09-07T06:13:52.7338536Z * [new branch] gh/IvanKobzarev/146/head -> origin/gh/IvanKobzarev/146/head 2025-09-07T06:13:52.7339771Z * [new branch] gh/IvanKobzarev/146/orig -> origin/gh/IvanKobzarev/146/orig 2025-09-07T06:13:52.7341862Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-09-07T06:13:52.7343119Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-09-07T06:13:52.7344745Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-09-07T06:13:52.7345788Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-09-07T06:13:52.7347681Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-09-07T06:13:52.7348990Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-09-07T06:13:52.7352285Z * [new branch] gh/PaliC/1/base -> origin/gh/PaliC/1/base 2025-09-07T06:13:52.7353434Z * [new branch] gh/PaliC/1/head -> origin/gh/PaliC/1/head 2025-09-07T06:13:52.7354613Z * [new branch] gh/PaliC/1/orig -> origin/gh/PaliC/1/orig 2025-09-07T06:13:52.7356371Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-09-07T06:13:52.7357495Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-09-07T06:13:52.7358748Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-09-07T06:13:52.7360592Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-09-07T06:13:52.7361663Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-09-07T06:13:52.7362797Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-09-07T06:13:52.7364393Z * [new branch] gh/PaliC/2/base -> origin/gh/PaliC/2/base 2025-09-07T06:13:52.7365437Z * [new branch] gh/PaliC/2/head -> origin/gh/PaliC/2/head 2025-09-07T06:13:52.7366541Z * [new branch] gh/PaliC/2/orig -> origin/gh/PaliC/2/orig 2025-09-07T06:13:52.7368380Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-09-07T06:13:52.7369450Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-09-07T06:13:52.7370573Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-09-07T06:13:52.7372525Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-09-07T06:13:52.7373787Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-09-07T06:13:52.7375005Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-09-07T06:13:52.7376634Z * [new branch] gh/PaliC/22/base -> origin/gh/PaliC/22/base 2025-09-07T06:13:52.7377739Z * [new branch] gh/PaliC/22/head -> origin/gh/PaliC/22/head 2025-09-07T06:13:52.7378936Z * [new branch] gh/PaliC/22/orig -> origin/gh/PaliC/22/orig 2025-09-07T06:13:52.7380517Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-09-07T06:13:52.7381619Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-09-07T06:13:52.7382796Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-09-07T06:13:52.7384588Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-09-07T06:13:52.7385635Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-09-07T06:13:52.7386740Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-09-07T06:13:52.7388801Z * [new branch] gh/PaulZhang12/17/base -> origin/gh/PaulZhang12/17/base 2025-09-07T06:13:52.7389868Z * [new branch] gh/PaulZhang12/17/head -> origin/gh/PaulZhang12/17/head 2025-09-07T06:13:52.7391710Z * [new branch] gh/PaulZhang12/20/base -> origin/gh/PaulZhang12/20/base 2025-09-07T06:13:52.7392732Z * [new branch] gh/PaulZhang12/20/head -> origin/gh/PaulZhang12/20/head 2025-09-07T06:13:52.7393916Z * [new branch] gh/PaulZhang12/20/orig -> origin/gh/PaulZhang12/20/orig 2025-09-07T06:13:52.7395568Z * [new branch] gh/PaulZhang12/21/base -> origin/gh/PaulZhang12/21/base 2025-09-07T06:13:52.7396739Z * [new branch] gh/PaulZhang12/21/head -> origin/gh/PaulZhang12/21/head 2025-09-07T06:13:52.7397864Z * [new branch] gh/PaulZhang12/21/orig -> origin/gh/PaulZhang12/21/orig 2025-09-07T06:13:52.7399513Z * [new branch] gh/PaulZhang12/22/base -> origin/gh/PaulZhang12/22/base 2025-09-07T06:13:52.7400587Z * [new branch] gh/PaulZhang12/22/head -> origin/gh/PaulZhang12/22/head 2025-09-07T06:13:52.7401687Z * [new branch] gh/PaulZhang12/22/orig -> origin/gh/PaulZhang12/22/orig 2025-09-07T06:13:52.7403321Z * [new branch] gh/PaulZhang12/23/base -> origin/gh/PaulZhang12/23/base 2025-09-07T06:13:52.7404417Z * [new branch] gh/PaulZhang12/23/head -> origin/gh/PaulZhang12/23/head 2025-09-07T06:13:52.7405525Z * [new branch] gh/PaulZhang12/23/orig -> origin/gh/PaulZhang12/23/orig 2025-09-07T06:13:52.7407056Z * [new branch] gh/PaulZhang12/24/base -> origin/gh/PaulZhang12/24/base 2025-09-07T06:13:52.7408204Z * [new branch] gh/PaulZhang12/24/head -> origin/gh/PaulZhang12/24/head 2025-09-07T06:13:52.7409390Z * [new branch] gh/PaulZhang12/24/orig -> origin/gh/PaulZhang12/24/orig 2025-09-07T06:13:52.7411007Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-09-07T06:13:52.7412382Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-09-07T06:13:52.7413560Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-09-07T06:13:52.7415593Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-09-07T06:13:52.7416679Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-09-07T06:13:52.7419194Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-09-07T06:13:52.7420730Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-09-07T06:13:52.7422079Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-09-07T06:13:52.7423925Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-09-07T06:13:52.7425837Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-09-07T06:13:52.7426887Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-09-07T06:13:52.7428572Z * [new branch] gh/StrongerXi/133/base -> origin/gh/StrongerXi/133/base 2025-09-07T06:13:52.7429624Z * [new branch] gh/StrongerXi/133/head -> origin/gh/StrongerXi/133/head 2025-09-07T06:13:52.7430824Z * [new branch] gh/StrongerXi/133/orig -> origin/gh/StrongerXi/133/orig 2025-09-07T06:13:52.7432395Z * [new branch] gh/StrongerXi/134/base -> origin/gh/StrongerXi/134/base 2025-09-07T06:13:52.7433501Z * [new branch] gh/StrongerXi/134/head -> origin/gh/StrongerXi/134/head 2025-09-07T06:13:52.7434632Z * [new branch] gh/StrongerXi/134/orig -> origin/gh/StrongerXi/134/orig 2025-09-07T06:13:52.7436244Z * [new branch] gh/StrongerXi/136/base -> origin/gh/StrongerXi/136/base 2025-09-07T06:13:52.7437255Z * [new branch] gh/StrongerXi/136/head -> origin/gh/StrongerXi/136/head 2025-09-07T06:13:52.7438387Z * [new branch] gh/StrongerXi/136/orig -> origin/gh/StrongerXi/136/orig 2025-09-07T06:13:52.7439911Z * [new branch] gh/StrongerXi/137/base -> origin/gh/StrongerXi/137/base 2025-09-07T06:13:52.7440965Z * [new branch] gh/StrongerXi/137/head -> origin/gh/StrongerXi/137/head 2025-09-07T06:13:52.7442129Z * [new branch] gh/StrongerXi/137/orig -> origin/gh/StrongerXi/137/orig 2025-09-07T06:13:52.7443752Z * [new branch] gh/StrongerXi/138/base -> origin/gh/StrongerXi/138/base 2025-09-07T06:13:52.7444823Z * [new branch] gh/StrongerXi/138/head -> origin/gh/StrongerXi/138/head 2025-09-07T06:13:52.7445957Z * [new branch] gh/StrongerXi/138/orig -> origin/gh/StrongerXi/138/orig 2025-09-07T06:13:52.7447540Z * [new branch] gh/StrongerXi/139/base -> origin/gh/StrongerXi/139/base 2025-09-07T06:13:52.7448589Z * [new branch] gh/StrongerXi/139/head -> origin/gh/StrongerXi/139/head 2025-09-07T06:13:52.7450413Z * [new branch] gh/StrongerXi/139/orig -> origin/gh/StrongerXi/139/orig 2025-09-07T06:13:52.7452103Z * [new branch] gh/StrongerXi/140/base -> origin/gh/StrongerXi/140/base 2025-09-07T06:13:52.7453233Z * [new branch] gh/StrongerXi/140/head -> origin/gh/StrongerXi/140/head 2025-09-07T06:13:52.7454643Z * [new branch] gh/StrongerXi/140/orig -> origin/gh/StrongerXi/140/orig 2025-09-07T06:13:52.7456472Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-09-07T06:13:52.7457407Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-09-07T06:13:52.7458947Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-09-07T06:13:52.7460036Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-09-07T06:13:52.7462177Z * [new branch] gh/XilunWu/133/base -> origin/gh/XilunWu/133/base 2025-09-07T06:13:52.7463347Z * [new branch] gh/XilunWu/133/head -> origin/gh/XilunWu/133/head 2025-09-07T06:13:52.7464618Z * [new branch] gh/XilunWu/133/orig -> origin/gh/XilunWu/133/orig 2025-09-07T06:13:52.7466281Z * [new branch] gh/XilunWu/139/base -> origin/gh/XilunWu/139/base 2025-09-07T06:13:52.7467329Z * [new branch] gh/XilunWu/139/head -> origin/gh/XilunWu/139/head 2025-09-07T06:13:52.7468385Z * [new branch] gh/XilunWu/139/orig -> origin/gh/XilunWu/139/orig 2025-09-07T06:13:52.7470114Z * [new branch] gh/XilunWu/143/base -> origin/gh/XilunWu/143/base 2025-09-07T06:13:52.7471280Z * [new branch] gh/XilunWu/143/head -> origin/gh/XilunWu/143/head 2025-09-07T06:13:52.7472410Z * [new branch] gh/XilunWu/143/orig -> origin/gh/XilunWu/143/orig 2025-09-07T06:13:52.7474234Z * [new branch] gh/XilunWu/144/base -> origin/gh/XilunWu/144/base 2025-09-07T06:13:52.7475337Z * [new branch] gh/XilunWu/144/head -> origin/gh/XilunWu/144/head 2025-09-07T06:13:52.7476468Z * [new branch] gh/XilunWu/144/orig -> origin/gh/XilunWu/144/orig 2025-09-07T06:13:52.7478214Z * [new branch] gh/XilunWu/145/base -> origin/gh/XilunWu/145/base 2025-09-07T06:13:52.7479210Z * [new branch] gh/XilunWu/145/head -> origin/gh/XilunWu/145/head 2025-09-07T06:13:52.7480310Z * [new branch] gh/XilunWu/145/orig -> origin/gh/XilunWu/145/orig 2025-09-07T06:13:52.7481834Z * [new branch] gh/XilunWu/146/base -> origin/gh/XilunWu/146/base 2025-09-07T06:13:52.7482876Z * [new branch] gh/XilunWu/146/head -> origin/gh/XilunWu/146/head 2025-09-07T06:13:52.7484020Z * [new branch] gh/XilunWu/146/orig -> origin/gh/XilunWu/146/orig 2025-09-07T06:13:52.7485670Z * [new branch] gh/XilunWu/147/base -> origin/gh/XilunWu/147/base 2025-09-07T06:13:52.7486749Z * [new branch] gh/XilunWu/147/head -> origin/gh/XilunWu/147/head 2025-09-07T06:13:52.7487863Z * [new branch] gh/XilunWu/147/orig -> origin/gh/XilunWu/147/orig 2025-09-07T06:13:52.7489357Z * [new branch] gh/XilunWu/148/base -> origin/gh/XilunWu/148/base 2025-09-07T06:13:52.7490460Z * [new branch] gh/XilunWu/148/head -> origin/gh/XilunWu/148/head 2025-09-07T06:13:52.7491618Z * [new branch] gh/XilunWu/148/orig -> origin/gh/XilunWu/148/orig 2025-09-07T06:13:52.7493381Z * [new branch] gh/XilunWu/149/base -> origin/gh/XilunWu/149/base 2025-09-07T06:13:52.7494469Z * [new branch] gh/XilunWu/149/head -> origin/gh/XilunWu/149/head 2025-09-07T06:13:52.7495683Z * [new branch] gh/XilunWu/149/orig -> origin/gh/XilunWu/149/orig 2025-09-07T06:13:52.7497203Z * [new branch] gh/XilunWu/150/base -> origin/gh/XilunWu/150/base 2025-09-07T06:13:52.7498434Z * [new branch] gh/XilunWu/150/head -> origin/gh/XilunWu/150/head 2025-09-07T06:13:52.7499496Z * [new branch] gh/XilunWu/150/orig -> origin/gh/XilunWu/150/orig 2025-09-07T06:13:52.7501195Z * [new branch] gh/XilunWu/151/base -> origin/gh/XilunWu/151/base 2025-09-07T06:13:52.7502494Z * [new branch] gh/XilunWu/151/head -> origin/gh/XilunWu/151/head 2025-09-07T06:13:52.7503556Z * [new branch] gh/XilunWu/151/orig -> origin/gh/XilunWu/151/orig 2025-09-07T06:13:52.7505243Z * [new branch] gh/XilunWu/152/base -> origin/gh/XilunWu/152/base 2025-09-07T06:13:52.7506230Z * [new branch] gh/XilunWu/152/head -> origin/gh/XilunWu/152/head 2025-09-07T06:13:52.7507261Z * [new branch] gh/XilunWu/152/orig -> origin/gh/XilunWu/152/orig 2025-09-07T06:13:52.7509125Z * [new branch] gh/XilunWu/153/base -> origin/gh/XilunWu/153/base 2025-09-07T06:13:52.7510388Z * [new branch] gh/XilunWu/153/head -> origin/gh/XilunWu/153/head 2025-09-07T06:13:52.7511523Z * [new branch] gh/XilunWu/153/orig -> origin/gh/XilunWu/153/orig 2025-09-07T06:13:52.7513336Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-09-07T06:13:52.7514335Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-09-07T06:13:52.7515526Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-09-07T06:13:52.7517301Z * [new branch] gh/XilunWu/161/base -> origin/gh/XilunWu/161/base 2025-09-07T06:13:52.7518300Z * [new branch] gh/XilunWu/161/head -> origin/gh/XilunWu/161/head 2025-09-07T06:13:52.7519434Z * [new branch] gh/XilunWu/161/orig -> origin/gh/XilunWu/161/orig 2025-09-07T06:13:52.7521204Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-09-07T06:13:52.7522316Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-09-07T06:13:52.7523493Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-09-07T06:13:52.7525624Z * [new branch] gh/XilunWu/164/base -> origin/gh/XilunWu/164/base 2025-09-07T06:13:52.7526842Z * [new branch] gh/XilunWu/164/head -> origin/gh/XilunWu/164/head 2025-09-07T06:13:52.7527976Z * [new branch] gh/XilunWu/164/orig -> origin/gh/XilunWu/164/orig 2025-09-07T06:13:52.7529815Z * [new branch] gh/XilunWu/165/base -> origin/gh/XilunWu/165/base 2025-09-07T06:13:52.7531020Z * [new branch] gh/XilunWu/165/head -> origin/gh/XilunWu/165/head 2025-09-07T06:13:52.7532526Z * [new branch] gh/XilunWu/165/orig -> origin/gh/XilunWu/165/orig 2025-09-07T06:13:52.7534382Z * [new branch] gh/XilunWu/166/base -> origin/gh/XilunWu/166/base 2025-09-07T06:13:52.7535573Z * [new branch] gh/XilunWu/166/head -> origin/gh/XilunWu/166/head 2025-09-07T06:13:52.7536751Z * [new branch] gh/XilunWu/166/orig -> origin/gh/XilunWu/166/orig 2025-09-07T06:13:52.7538564Z * [new branch] gh/XilunWu/167/base -> origin/gh/XilunWu/167/base 2025-09-07T06:13:52.7539710Z * [new branch] gh/XilunWu/167/head -> origin/gh/XilunWu/167/head 2025-09-07T06:13:52.7540886Z * [new branch] gh/XilunWu/167/orig -> origin/gh/XilunWu/167/orig 2025-09-07T06:13:52.7542711Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-09-07T06:13:52.7543859Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-09-07T06:13:52.7545007Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-09-07T06:13:52.7546650Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-09-07T06:13:52.7547722Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-09-07T06:13:52.7549014Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-09-07T06:13:52.7550932Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-09-07T06:13:52.7552129Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-09-07T06:13:52.7553178Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-09-07T06:13:52.7555271Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-09-07T06:13:52.7556479Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-09-07T06:13:52.7557554Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-09-07T06:13:52.7559276Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-09-07T06:13:52.7560353Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-09-07T06:13:52.7561788Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-09-07T06:13:52.7563544Z * [new branch] gh/XuehaiPan/189/base -> origin/gh/XuehaiPan/189/base 2025-09-07T06:13:52.7564585Z * [new branch] gh/XuehaiPan/189/head -> origin/gh/XuehaiPan/189/head 2025-09-07T06:13:52.7565716Z * [new branch] gh/XuehaiPan/189/orig -> origin/gh/XuehaiPan/189/orig 2025-09-07T06:13:52.7567355Z * [new branch] gh/XuehaiPan/232/base -> origin/gh/XuehaiPan/232/base 2025-09-07T06:13:52.7568416Z * [new branch] gh/XuehaiPan/232/head -> origin/gh/XuehaiPan/232/head 2025-09-07T06:13:52.7569513Z * [new branch] gh/XuehaiPan/232/orig -> origin/gh/XuehaiPan/232/orig 2025-09-07T06:13:52.7571193Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-09-07T06:13:52.7572727Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-09-07T06:13:52.7573830Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-09-07T06:13:52.7575472Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-09-07T06:13:52.7576553Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-09-07T06:13:52.7577694Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-09-07T06:13:52.7579323Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-09-07T06:13:52.7580586Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-09-07T06:13:52.7581876Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-09-07T06:13:52.7583573Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-09-07T06:13:52.7584750Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-09-07T06:13:52.7585774Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-09-07T06:13:52.7587466Z * [new branch] gh/XuehaiPan/257/base -> origin/gh/XuehaiPan/257/base 2025-09-07T06:13:52.7588512Z * [new branch] gh/XuehaiPan/257/head -> origin/gh/XuehaiPan/257/head 2025-09-07T06:13:52.7589649Z * [new branch] gh/XuehaiPan/257/orig -> origin/gh/XuehaiPan/257/orig 2025-09-07T06:13:52.7591282Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-09-07T06:13:52.7592334Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-09-07T06:13:52.7593469Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-09-07T06:13:52.7595084Z * [new branch] gh/XuehaiPan/290/base -> origin/gh/XuehaiPan/290/base 2025-09-07T06:13:52.7596269Z * [new branch] gh/XuehaiPan/290/head -> origin/gh/XuehaiPan/290/head 2025-09-07T06:13:52.7597396Z * [new branch] gh/XuehaiPan/290/orig -> origin/gh/XuehaiPan/290/orig 2025-09-07T06:13:52.7598938Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-09-07T06:13:52.7607725Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-09-07T06:13:52.7608404Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-09-07T06:13:52.7609044Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-09-07T06:13:52.7609685Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-09-07T06:13:52.7610313Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-09-07T06:13:52.7610954Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-09-07T06:13:52.7611660Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-09-07T06:13:52.7612513Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-09-07T06:13:52.7613173Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-09-07T06:13:52.7613892Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-09-07T06:13:52.7614554Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-09-07T06:13:52.7615196Z * [new branch] gh/XuehaiPan/356/base -> origin/gh/XuehaiPan/356/base 2025-09-07T06:13:52.7615856Z * [new branch] gh/XuehaiPan/356/head -> origin/gh/XuehaiPan/356/head 2025-09-07T06:13:52.7616806Z * [new branch] gh/XuehaiPan/356/orig -> origin/gh/XuehaiPan/356/orig 2025-09-07T06:13:52.7618473Z * [new branch] gh/XuehaiPan/357/base -> origin/gh/XuehaiPan/357/base 2025-09-07T06:13:52.7619562Z * [new branch] gh/XuehaiPan/357/head -> origin/gh/XuehaiPan/357/head 2025-09-07T06:13:52.7620765Z * [new branch] gh/XuehaiPan/357/orig -> origin/gh/XuehaiPan/357/orig 2025-09-07T06:13:52.7622819Z * [new branch] gh/XuehaiPan/358/base -> origin/gh/XuehaiPan/358/base 2025-09-07T06:13:52.7624030Z * [new branch] gh/XuehaiPan/358/head -> origin/gh/XuehaiPan/358/head 2025-09-07T06:13:52.7625153Z * [new branch] gh/XuehaiPan/358/orig -> origin/gh/XuehaiPan/358/orig 2025-09-07T06:13:52.7626787Z * [new branch] gh/XuehaiPan/359/base -> origin/gh/XuehaiPan/359/base 2025-09-07T06:13:52.7627863Z * [new branch] gh/XuehaiPan/359/head -> origin/gh/XuehaiPan/359/head 2025-09-07T06:13:52.7628991Z * [new branch] gh/XuehaiPan/359/orig -> origin/gh/XuehaiPan/359/orig 2025-09-07T06:13:52.7630562Z * [new branch] gh/XuehaiPan/360/base -> origin/gh/XuehaiPan/360/base 2025-09-07T06:13:52.7631618Z * [new branch] gh/XuehaiPan/360/head -> origin/gh/XuehaiPan/360/head 2025-09-07T06:13:52.7632737Z * [new branch] gh/XuehaiPan/360/orig -> origin/gh/XuehaiPan/360/orig 2025-09-07T06:13:52.7634499Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-09-07T06:13:52.7635504Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-09-07T06:13:52.7636618Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-09-07T06:13:52.7638287Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-09-07T06:13:52.7639337Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-09-07T06:13:52.7640946Z * [new branch] gh/XuehaiPan/369/base -> origin/gh/XuehaiPan/369/base 2025-09-07T06:13:52.7641997Z * [new branch] gh/XuehaiPan/369/head -> origin/gh/XuehaiPan/369/head 2025-09-07T06:13:52.7643300Z * [new branch] gh/XuehaiPan/369/orig -> origin/gh/XuehaiPan/369/orig 2025-09-07T06:13:52.7644756Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-09-07T06:13:52.7645778Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-09-07T06:13:52.7647071Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-09-07T06:13:52.7648926Z * [new branch] gh/XuehaiPan/380/base -> origin/gh/XuehaiPan/380/base 2025-09-07T06:13:52.7652802Z * [new branch] gh/XuehaiPan/380/head -> origin/gh/XuehaiPan/380/head 2025-09-07T06:13:52.7653914Z * [new branch] gh/XuehaiPan/380/orig -> origin/gh/XuehaiPan/380/orig 2025-09-07T06:13:52.7655718Z * [new branch] gh/XuehaiPan/381/base -> origin/gh/XuehaiPan/381/base 2025-09-07T06:13:52.7656759Z * [new branch] gh/XuehaiPan/381/head -> origin/gh/XuehaiPan/381/head 2025-09-07T06:13:52.7658505Z * [new branch] gh/XuehaiPan/382/base -> origin/gh/XuehaiPan/382/base 2025-09-07T06:13:52.7659607Z * [new branch] gh/XuehaiPan/382/head -> origin/gh/XuehaiPan/382/head 2025-09-07T06:13:52.7660794Z * [new branch] gh/XuehaiPan/382/orig -> origin/gh/XuehaiPan/382/orig 2025-09-07T06:13:52.7662587Z * [new branch] gh/XuehaiPan/383/base -> origin/gh/XuehaiPan/383/base 2025-09-07T06:13:52.7663737Z * [new branch] gh/XuehaiPan/383/head -> origin/gh/XuehaiPan/383/head 2025-09-07T06:13:52.7664927Z * [new branch] gh/XuehaiPan/383/orig -> origin/gh/XuehaiPan/383/orig 2025-09-07T06:13:52.7666635Z * [new branch] gh/XuehaiPan/384/base -> origin/gh/XuehaiPan/384/base 2025-09-07T06:13:52.7667680Z * [new branch] gh/XuehaiPan/384/head -> origin/gh/XuehaiPan/384/head 2025-09-07T06:13:52.7668808Z * [new branch] gh/XuehaiPan/384/orig -> origin/gh/XuehaiPan/384/orig 2025-09-07T06:13:52.7670511Z * [new branch] gh/XuehaiPan/385/base -> origin/gh/XuehaiPan/385/base 2025-09-07T06:13:52.7671584Z * [new branch] gh/XuehaiPan/385/head -> origin/gh/XuehaiPan/385/head 2025-09-07T06:13:52.7672611Z * [new branch] gh/XuehaiPan/385/orig -> origin/gh/XuehaiPan/385/orig 2025-09-07T06:13:52.7674179Z * [new branch] gh/XuehaiPan/386/base -> origin/gh/XuehaiPan/386/base 2025-09-07T06:13:52.7675233Z * [new branch] gh/XuehaiPan/386/head -> origin/gh/XuehaiPan/386/head 2025-09-07T06:13:52.7676369Z * [new branch] gh/XuehaiPan/386/orig -> origin/gh/XuehaiPan/386/orig 2025-09-07T06:13:52.7677947Z * [new branch] gh/XuehaiPan/387/base -> origin/gh/XuehaiPan/387/base 2025-09-07T06:13:52.7678991Z * [new branch] gh/XuehaiPan/387/head -> origin/gh/XuehaiPan/387/head 2025-09-07T06:13:52.7680168Z * [new branch] gh/XuehaiPan/387/orig -> origin/gh/XuehaiPan/387/orig 2025-09-07T06:13:52.7682071Z * [new branch] gh/ZainRizvi/1/base -> origin/gh/ZainRizvi/1/base 2025-09-07T06:13:52.7683136Z * [new branch] gh/ZainRizvi/1/head -> origin/gh/ZainRizvi/1/head 2025-09-07T06:13:52.7684633Z * [new branch] gh/ZainRizvi/2/base -> origin/gh/ZainRizvi/2/base 2025-09-07T06:13:52.7685613Z * [new branch] gh/ZainRizvi/2/head -> origin/gh/ZainRizvi/2/head 2025-09-07T06:13:52.7687156Z * [new branch] gh/ZainRizvi/3/base -> origin/gh/ZainRizvi/3/base 2025-09-07T06:13:52.7688114Z * [new branch] gh/ZainRizvi/3/head -> origin/gh/ZainRizvi/3/head 2025-09-07T06:13:52.7689708Z * [new branch] gh/ZainRizvi/4/base -> origin/gh/ZainRizvi/4/base 2025-09-07T06:13:52.7690725Z * [new branch] gh/ZainRizvi/4/head -> origin/gh/ZainRizvi/4/head 2025-09-07T06:13:52.7692708Z * [new branch] gh/ZainRizvi/5/base -> origin/gh/ZainRizvi/5/base 2025-09-07T06:13:52.7693653Z * [new branch] gh/ZainRizvi/5/head -> origin/gh/ZainRizvi/5/head 2025-09-07T06:13:52.7695218Z * [new branch] gh/ZainRizvi/6/base -> origin/gh/ZainRizvi/6/base 2025-09-07T06:13:52.7696286Z * [new branch] gh/ZainRizvi/6/head -> origin/gh/ZainRizvi/6/head 2025-09-07T06:13:52.7697430Z * [new branch] gh/ZainRizvi/6/orig -> origin/gh/ZainRizvi/6/orig 2025-09-07T06:13:52.7699041Z * [new branch] gh/ZainRizvi/7/base -> origin/gh/ZainRizvi/7/base 2025-09-07T06:13:52.7700134Z * [new branch] gh/ZainRizvi/7/head -> origin/gh/ZainRizvi/7/head 2025-09-07T06:13:52.7701295Z * [new branch] gh/ZainRizvi/7/orig -> origin/gh/ZainRizvi/7/orig 2025-09-07T06:13:52.7702968Z * [new branch] gh/ZainRizvi/8/base -> origin/gh/ZainRizvi/8/base 2025-09-07T06:13:52.7704200Z * [new branch] gh/ZainRizvi/8/head -> origin/gh/ZainRizvi/8/head 2025-09-07T06:13:52.7705834Z * [new branch] gh/ZainRizvi/9/base -> origin/gh/ZainRizvi/9/base 2025-09-07T06:13:52.7706837Z * [new branch] gh/ZainRizvi/9/head -> origin/gh/ZainRizvi/9/head 2025-09-07T06:13:52.7707971Z * [new branch] gh/ZainRizvi/9/orig -> origin/gh/ZainRizvi/9/orig 2025-09-07T06:13:52.7709928Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-09-07T06:13:52.7710980Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-09-07T06:13:52.7712233Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-09-07T06:13:52.7713837Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-09-07T06:13:52.7744864Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-09-07T06:13:52.7745672Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-09-07T06:13:52.7746365Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-09-07T06:13:52.7747032Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-09-07T06:13:52.7747666Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-09-07T06:13:52.7748313Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-09-07T06:13:52.7749326Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-09-07T06:13:52.7750005Z * [new branch] gh/ZhiweiYan-96/64/base -> origin/gh/ZhiweiYan-96/64/base 2025-09-07T06:13:52.7750673Z * [new branch] gh/ZhiweiYan-96/64/head -> origin/gh/ZhiweiYan-96/64/head 2025-09-07T06:13:52.7751333Z * [new branch] gh/ZhiweiYan-96/64/orig -> origin/gh/ZhiweiYan-96/64/orig 2025-09-07T06:13:52.7752002Z * [new branch] gh/ZhiweiYan-96/65/base -> origin/gh/ZhiweiYan-96/65/base 2025-09-07T06:13:52.7752663Z * [new branch] gh/ZhiweiYan-96/65/head -> origin/gh/ZhiweiYan-96/65/head 2025-09-07T06:13:52.7753333Z * [new branch] gh/ZhiweiYan-96/65/orig -> origin/gh/ZhiweiYan-96/65/orig 2025-09-07T06:13:52.7753985Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-09-07T06:13:52.7754653Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-09-07T06:13:52.7755323Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-09-07T06:13:52.7755977Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-09-07T06:13:52.7756643Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-09-07T06:13:52.7757476Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-09-07T06:13:52.7758148Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-09-07T06:13:52.7758816Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-09-07T06:13:52.7759448Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-09-07T06:13:52.7760088Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-09-07T06:13:52.7760708Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-09-07T06:13:52.7761458Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-09-07T06:13:52.7762082Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-09-07T06:13:52.7762687Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-09-07T06:13:52.7763568Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-09-07T06:13:52.7764543Z * [new branch] gh/alexsamardzic/9/base -> origin/gh/alexsamardzic/9/base 2025-09-07T06:13:52.7765213Z * [new branch] gh/alexsamardzic/9/head -> origin/gh/alexsamardzic/9/head 2025-09-07T06:13:52.7765881Z * [new branch] gh/alexsamardzic/9/orig -> origin/gh/alexsamardzic/9/orig 2025-09-07T06:13:52.7766502Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-09-07T06:13:52.7767101Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-09-07T06:13:52.7767683Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-09-07T06:13:52.7768305Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-09-07T06:13:52.7768941Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-09-07T06:13:52.7769556Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-09-07T06:13:52.7770192Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-09-07T06:13:52.7770805Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-09-07T06:13:52.7771556Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-09-07T06:13:52.7772376Z * [new branch] gh/andrewor14/51/base -> origin/gh/andrewor14/51/base 2025-09-07T06:13:52.7773015Z * [new branch] gh/andrewor14/51/orig -> origin/gh/andrewor14/51/orig 2025-09-07T06:13:52.7774405Z * [new branch] gh/andyanwang/1/base -> origin/gh/andyanwang/1/base 2025-09-07T06:13:52.7775528Z * [new branch] gh/andyanwang/1/head -> origin/gh/andyanwang/1/head 2025-09-07T06:13:52.7776682Z * [new branch] gh/andyanwang/1/orig -> origin/gh/andyanwang/1/orig 2025-09-07T06:13:52.7778580Z * [new branch] gh/andyanwang/13/base -> origin/gh/andyanwang/13/base 2025-09-07T06:13:52.7779652Z * [new branch] gh/andyanwang/13/head -> origin/gh/andyanwang/13/head 2025-09-07T06:13:52.7781684Z * [new branch] gh/andyanwang/13/orig -> origin/gh/andyanwang/13/orig 2025-09-07T06:13:52.7783304Z * [new branch] gh/andyanwang/2/base -> origin/gh/andyanwang/2/base 2025-09-07T06:13:52.7784540Z * [new branch] gh/andyanwang/2/head -> origin/gh/andyanwang/2/head 2025-09-07T06:13:52.7785704Z * [new branch] gh/andyanwang/2/orig -> origin/gh/andyanwang/2/orig 2025-09-07T06:13:52.7787408Z * [new branch] gh/andyanwang/28/base -> origin/gh/andyanwang/28/base 2025-09-07T06:13:52.7788593Z * [new branch] gh/andyanwang/28/head -> origin/gh/andyanwang/28/head 2025-09-07T06:13:52.7789913Z * [new branch] gh/andyanwang/28/orig -> origin/gh/andyanwang/28/orig 2025-09-07T06:13:52.7791460Z * [new branch] gh/andyanwang/3/base -> origin/gh/andyanwang/3/base 2025-09-07T06:13:52.7792590Z * [new branch] gh/andyanwang/3/head -> origin/gh/andyanwang/3/head 2025-09-07T06:13:52.7793774Z * [new branch] gh/andyanwang/3/orig -> origin/gh/andyanwang/3/orig 2025-09-07T06:13:52.7795429Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-09-07T06:13:52.7796682Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-09-07T06:13:52.7798352Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-09-07T06:13:52.7799592Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-09-07T06:13:52.7801630Z * [new branch] gh/andyanwang/32/base -> origin/gh/andyanwang/32/base 2025-09-07T06:13:52.7802739Z * [new branch] gh/andyanwang/32/head -> origin/gh/andyanwang/32/head 2025-09-07T06:13:52.7804040Z * [new branch] gh/andyanwang/32/orig -> origin/gh/andyanwang/32/orig 2025-09-07T06:13:52.7805726Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-09-07T06:13:52.7806868Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-09-07T06:13:52.7808027Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-09-07T06:13:52.7809715Z * [new branch] gh/andyanwang/4/base -> origin/gh/andyanwang/4/base 2025-09-07T06:13:52.7810704Z * [new branch] gh/andyanwang/4/head -> origin/gh/andyanwang/4/head 2025-09-07T06:13:52.7812249Z * [new branch] gh/andyanwang/4/orig -> origin/gh/andyanwang/4/orig 2025-09-07T06:13:52.7814229Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-09-07T06:13:52.7815367Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-09-07T06:13:52.7817036Z * [new branch] gh/angelayi/111/base -> origin/gh/angelayi/111/base 2025-09-07T06:13:52.7818135Z * [new branch] gh/angelayi/111/head -> origin/gh/angelayi/111/head 2025-09-07T06:13:52.7819253Z * [new branch] gh/angelayi/111/orig -> origin/gh/angelayi/111/orig 2025-09-07T06:13:52.7820925Z * [new branch] gh/angelayi/112/base -> origin/gh/angelayi/112/base 2025-09-07T06:13:52.7822119Z * [new branch] gh/angelayi/112/head -> origin/gh/angelayi/112/head 2025-09-07T06:13:52.7823354Z * [new branch] gh/angelayi/112/orig -> origin/gh/angelayi/112/orig 2025-09-07T06:13:52.7825261Z * [new branch] gh/angelayi/113/base -> origin/gh/angelayi/113/base 2025-09-07T06:13:52.7826233Z * [new branch] gh/angelayi/113/head -> origin/gh/angelayi/113/head 2025-09-07T06:13:52.7827350Z * [new branch] gh/angelayi/113/orig -> origin/gh/angelayi/113/orig 2025-09-07T06:13:52.7828987Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-09-07T06:13:52.7829964Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-09-07T06:13:52.7831088Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-09-07T06:13:52.7832701Z * [new branch] gh/angelayi/115/base -> origin/gh/angelayi/115/base 2025-09-07T06:13:52.7833827Z * [new branch] gh/angelayi/115/head -> origin/gh/angelayi/115/head 2025-09-07T06:13:52.7835022Z * [new branch] gh/angelayi/115/orig -> origin/gh/angelayi/115/orig 2025-09-07T06:13:52.7837198Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-09-07T06:13:52.7838203Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-09-07T06:13:52.7839309Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-09-07T06:13:52.7840990Z * [new branch] gh/anijain2305/766/base -> origin/gh/anijain2305/766/base 2025-09-07T06:13:52.7841967Z * [new branch] gh/anijain2305/766/head -> origin/gh/anijain2305/766/head 2025-09-07T06:13:52.7843056Z * [new branch] gh/anijain2305/766/orig -> origin/gh/anijain2305/766/orig 2025-09-07T06:13:52.7844710Z * [new branch] gh/anijain2305/790/base -> origin/gh/anijain2305/790/base 2025-09-07T06:13:52.7845989Z * [new branch] gh/anijain2305/790/head -> origin/gh/anijain2305/790/head 2025-09-07T06:13:52.7846865Z * [new branch] gh/anijain2305/790/orig -> origin/gh/anijain2305/790/orig 2025-09-07T06:13:52.7848456Z * [new branch] gh/anijain2305/792/base -> origin/gh/anijain2305/792/base 2025-09-07T06:13:52.7849976Z * [new branch] gh/anijain2305/792/head -> origin/gh/anijain2305/792/head 2025-09-07T06:13:52.7851123Z * [new branch] gh/anijain2305/792/orig -> origin/gh/anijain2305/792/orig 2025-09-07T06:13:52.7852932Z * [new branch] gh/anijain2305/803/base -> origin/gh/anijain2305/803/base 2025-09-07T06:13:52.7854019Z * [new branch] gh/anijain2305/803/head -> origin/gh/anijain2305/803/head 2025-09-07T06:13:52.7855157Z * [new branch] gh/anijain2305/803/orig -> origin/gh/anijain2305/803/orig 2025-09-07T06:13:52.7856841Z * [new branch] gh/anijain2305/804/base -> origin/gh/anijain2305/804/base 2025-09-07T06:13:52.7857881Z * [new branch] gh/anijain2305/804/head -> origin/gh/anijain2305/804/head 2025-09-07T06:13:52.7859238Z * [new branch] gh/anijain2305/804/orig -> origin/gh/anijain2305/804/orig 2025-09-07T06:13:52.7860878Z * [new branch] gh/anijain2305/805/base -> origin/gh/anijain2305/805/base 2025-09-07T06:13:52.7861971Z * [new branch] gh/anijain2305/805/head -> origin/gh/anijain2305/805/head 2025-09-07T06:13:52.7863201Z * [new branch] gh/anijain2305/805/orig -> origin/gh/anijain2305/805/orig 2025-09-07T06:13:52.7864974Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-09-07T06:13:52.7866018Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-09-07T06:13:52.7867170Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-09-07T06:13:52.7868847Z * [new branch] gh/anijain2305/812/base -> origin/gh/anijain2305/812/base 2025-09-07T06:13:52.7870106Z * [new branch] gh/anijain2305/812/head -> origin/gh/anijain2305/812/head 2025-09-07T06:13:52.7871108Z * [new branch] gh/anijain2305/812/orig -> origin/gh/anijain2305/812/orig 2025-09-07T06:13:52.7872689Z * [new branch] gh/anijain2305/838/base -> origin/gh/anijain2305/838/base 2025-09-07T06:13:52.7873775Z * [new branch] gh/anijain2305/838/head -> origin/gh/anijain2305/838/head 2025-09-07T06:13:52.7874886Z * [new branch] gh/anijain2305/838/orig -> origin/gh/anijain2305/838/orig 2025-09-07T06:13:52.7876540Z * [new branch] gh/anijain2305/839/base -> origin/gh/anijain2305/839/base 2025-09-07T06:13:52.7877579Z * [new branch] gh/anijain2305/839/head -> origin/gh/anijain2305/839/head 2025-09-07T06:13:52.7878717Z * [new branch] gh/anijain2305/839/orig -> origin/gh/anijain2305/839/orig 2025-09-07T06:13:52.7880297Z * [new branch] gh/anijain2305/843/base -> origin/gh/anijain2305/843/base 2025-09-07T06:13:52.7881445Z * [new branch] gh/anijain2305/843/head -> origin/gh/anijain2305/843/head 2025-09-07T06:13:52.7882481Z * [new branch] gh/anijain2305/843/orig -> origin/gh/anijain2305/843/orig 2025-09-07T06:13:52.7884090Z * [new branch] gh/anijain2305/844/base -> origin/gh/anijain2305/844/base 2025-09-07T06:13:52.7885141Z * [new branch] gh/anijain2305/844/head -> origin/gh/anijain2305/844/head 2025-09-07T06:13:52.7886236Z * [new branch] gh/anijain2305/844/orig -> origin/gh/anijain2305/844/orig 2025-09-07T06:13:52.7887895Z * [new branch] gh/anijain2305/846/base -> origin/gh/anijain2305/846/base 2025-09-07T06:13:52.7888954Z * [new branch] gh/anijain2305/846/head -> origin/gh/anijain2305/846/head 2025-09-07T06:13:52.7890042Z * [new branch] gh/anijain2305/846/orig -> origin/gh/anijain2305/846/orig 2025-09-07T06:13:52.7891974Z * [new branch] gh/anijain2305/848/base -> origin/gh/anijain2305/848/base 2025-09-07T06:13:52.7893249Z * [new branch] gh/anijain2305/848/head -> origin/gh/anijain2305/848/head 2025-09-07T06:13:52.7894379Z * [new branch] gh/anijain2305/848/orig -> origin/gh/anijain2305/848/orig 2025-09-07T06:13:52.7896047Z * [new branch] gh/anijain2305/849/base -> origin/gh/anijain2305/849/base 2025-09-07T06:13:52.7897108Z * [new branch] gh/anijain2305/849/head -> origin/gh/anijain2305/849/head 2025-09-07T06:13:52.7898298Z * [new branch] gh/anijain2305/849/orig -> origin/gh/anijain2305/849/orig 2025-09-07T06:13:52.7900349Z * [new branch] gh/anijain2305/850/base -> origin/gh/anijain2305/850/base 2025-09-07T06:13:52.7901441Z * [new branch] gh/anijain2305/850/head -> origin/gh/anijain2305/850/head 2025-09-07T06:13:52.7902629Z * [new branch] gh/anijain2305/850/orig -> origin/gh/anijain2305/850/orig 2025-09-07T06:13:52.7904450Z * [new branch] gh/anijain2305/851/base -> origin/gh/anijain2305/851/base 2025-09-07T06:13:52.7905508Z * [new branch] gh/anijain2305/851/head -> origin/gh/anijain2305/851/head 2025-09-07T06:13:52.7906637Z * [new branch] gh/anijain2305/851/orig -> origin/gh/anijain2305/851/orig 2025-09-07T06:13:52.7908436Z * [new branch] gh/anijain2305/852/base -> origin/gh/anijain2305/852/base 2025-09-07T06:13:52.7909500Z * [new branch] gh/anijain2305/852/head -> origin/gh/anijain2305/852/head 2025-09-07T06:13:52.7910623Z * [new branch] gh/anijain2305/852/orig -> origin/gh/anijain2305/852/orig 2025-09-07T06:13:52.7912267Z * [new branch] gh/anijain2305/853/base -> origin/gh/anijain2305/853/base 2025-09-07T06:13:52.7913760Z * [new branch] gh/anijain2305/853/head -> origin/gh/anijain2305/853/head 2025-09-07T06:13:52.7914393Z * [new branch] gh/anijain2305/853/orig -> origin/gh/anijain2305/853/orig 2025-09-07T06:13:52.7916039Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-09-07T06:13:52.7917246Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-09-07T06:13:52.7918362Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-09-07T06:13:52.7920080Z * [new branch] gh/anijain2305/855/base -> origin/gh/anijain2305/855/base 2025-09-07T06:13:52.7921185Z * [new branch] gh/anijain2305/855/head -> origin/gh/anijain2305/855/head 2025-09-07T06:13:52.7922311Z * [new branch] gh/anijain2305/855/orig -> origin/gh/anijain2305/855/orig 2025-09-07T06:13:52.7923955Z * [new branch] gh/anijain2305/856/base -> origin/gh/anijain2305/856/base 2025-09-07T06:13:52.7925157Z * [new branch] gh/anijain2305/856/head -> origin/gh/anijain2305/856/head 2025-09-07T06:13:52.7926357Z * [new branch] gh/anijain2305/856/orig -> origin/gh/anijain2305/856/orig 2025-09-07T06:13:52.7927982Z * [new branch] gh/anijain2305/857/base -> origin/gh/anijain2305/857/base 2025-09-07T06:13:52.7928979Z * [new branch] gh/anijain2305/857/head -> origin/gh/anijain2305/857/head 2025-09-07T06:13:52.7930103Z * [new branch] gh/anijain2305/857/orig -> origin/gh/anijain2305/857/orig 2025-09-07T06:13:52.7932026Z * [new branch] gh/anijain2305/858/base -> origin/gh/anijain2305/858/base 2025-09-07T06:13:52.7933170Z * [new branch] gh/anijain2305/858/head -> origin/gh/anijain2305/858/head 2025-09-07T06:13:52.7934318Z * [new branch] gh/anijain2305/858/orig -> origin/gh/anijain2305/858/orig 2025-09-07T06:13:52.7936095Z * [new branch] gh/anijain2305/859/base -> origin/gh/anijain2305/859/base 2025-09-07T06:13:52.7937187Z * [new branch] gh/anijain2305/859/head -> origin/gh/anijain2305/859/head 2025-09-07T06:13:52.7938412Z * [new branch] gh/anijain2305/859/orig -> origin/gh/anijain2305/859/orig 2025-09-07T06:13:52.7940019Z * [new branch] gh/anijain2305/860/base -> origin/gh/anijain2305/860/base 2025-09-07T06:13:52.7941096Z * [new branch] gh/anijain2305/860/head -> origin/gh/anijain2305/860/head 2025-09-07T06:13:52.7942306Z * [new branch] gh/anijain2305/860/orig -> origin/gh/anijain2305/860/orig 2025-09-07T06:13:52.7944064Z * [new branch] gh/anijain2305/861/base -> origin/gh/anijain2305/861/base 2025-09-07T06:13:52.7945118Z * [new branch] gh/anijain2305/861/head -> origin/gh/anijain2305/861/head 2025-09-07T06:13:52.7946299Z * [new branch] gh/anijain2305/861/orig -> origin/gh/anijain2305/861/orig 2025-09-07T06:13:52.7947954Z * [new branch] gh/anijain2305/862/base -> origin/gh/anijain2305/862/base 2025-09-07T06:13:52.7949499Z * [new branch] gh/anijain2305/862/head -> origin/gh/anijain2305/862/head 2025-09-07T06:13:52.7952259Z * [new branch] gh/anijain2305/862/orig -> origin/gh/anijain2305/862/orig 2025-09-07T06:13:52.7953974Z * [new branch] gh/anijain2305/863/base -> origin/gh/anijain2305/863/base 2025-09-07T06:13:52.7955198Z * [new branch] gh/anijain2305/863/head -> origin/gh/anijain2305/863/head 2025-09-07T06:13:52.7956357Z * [new branch] gh/anijain2305/863/orig -> origin/gh/anijain2305/863/orig 2025-09-07T06:13:52.7958159Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-09-07T06:13:52.7959238Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-09-07T06:13:52.7960372Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-09-07T06:13:52.7962268Z * [new branch] gh/anijain2305/865/base -> origin/gh/anijain2305/865/base 2025-09-07T06:13:52.7963349Z * [new branch] gh/anijain2305/865/head -> origin/gh/anijain2305/865/head 2025-09-07T06:13:52.7964468Z * [new branch] gh/anijain2305/865/orig -> origin/gh/anijain2305/865/orig 2025-09-07T06:13:52.7966150Z * [new branch] gh/anijain2305/866/base -> origin/gh/anijain2305/866/base 2025-09-07T06:13:52.7967176Z * [new branch] gh/anijain2305/866/head -> origin/gh/anijain2305/866/head 2025-09-07T06:13:52.7968277Z * [new branch] gh/anijain2305/866/orig -> origin/gh/anijain2305/866/orig 2025-09-07T06:13:52.7970285Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-09-07T06:13:52.7971393Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-09-07T06:13:52.7972851Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-09-07T06:13:52.7975068Z * [new branch] gh/ankitageorge/13/base -> origin/gh/ankitageorge/13/base 2025-09-07T06:13:52.7976031Z * [new branch] gh/ankitageorge/13/head -> origin/gh/ankitageorge/13/head 2025-09-07T06:13:52.7977282Z * [new branch] gh/ankitageorge/13/orig -> origin/gh/ankitageorge/13/orig 2025-09-07T06:13:52.7979086Z * [new branch] gh/ankitageorge/14/base -> origin/gh/ankitageorge/14/base 2025-09-07T06:13:52.7980114Z * [new branch] gh/ankitageorge/14/head -> origin/gh/ankitageorge/14/head 2025-09-07T06:13:52.7981620Z * [new branch] gh/ankitageorge/14/orig -> origin/gh/ankitageorge/14/orig 2025-09-07T06:13:52.7983238Z * [new branch] gh/ankitageorge/15/base -> origin/gh/ankitageorge/15/base 2025-09-07T06:13:52.7984514Z * [new branch] gh/ankitageorge/15/head -> origin/gh/ankitageorge/15/head 2025-09-07T06:13:52.7985692Z * [new branch] gh/ankitageorge/15/orig -> origin/gh/ankitageorge/15/orig 2025-09-07T06:13:52.7987422Z * [new branch] gh/ankitageorge/16/base -> origin/gh/ankitageorge/16/base 2025-09-07T06:13:52.7988601Z * [new branch] gh/ankitageorge/16/head -> origin/gh/ankitageorge/16/head 2025-09-07T06:13:52.7989816Z * [new branch] gh/ankitageorge/16/orig -> origin/gh/ankitageorge/16/orig 2025-09-07T06:13:52.7991630Z * [new branch] gh/ankitageorge/17/base -> origin/gh/ankitageorge/17/base 2025-09-07T06:13:52.7992663Z * [new branch] gh/ankitageorge/17/head -> origin/gh/ankitageorge/17/head 2025-09-07T06:13:52.7993750Z * [new branch] gh/ankitageorge/17/orig -> origin/gh/ankitageorge/17/orig 2025-09-07T06:13:52.7995555Z * [new branch] gh/ankitageorge/21/base -> origin/gh/ankitageorge/21/base 2025-09-07T06:13:52.7996711Z * [new branch] gh/ankitageorge/21/head -> origin/gh/ankitageorge/21/head 2025-09-07T06:13:52.7997860Z * [new branch] gh/ankitageorge/21/orig -> origin/gh/ankitageorge/21/orig 2025-09-07T06:13:52.7999897Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-09-07T06:13:52.8001048Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-09-07T06:13:52.8002652Z * [new branch] gh/anshul-si/15/base -> origin/gh/anshul-si/15/base 2025-09-07T06:13:52.8003736Z * [new branch] gh/anshul-si/15/head -> origin/gh/anshul-si/15/head 2025-09-07T06:13:52.8004948Z * [new branch] gh/anshul-si/15/orig -> origin/gh/anshul-si/15/orig 2025-09-07T06:13:52.8006726Z * [new branch] gh/anshul-si/16/base -> origin/gh/anshul-si/16/base 2025-09-07T06:13:52.8007785Z * [new branch] gh/anshul-si/16/head -> origin/gh/anshul-si/16/head 2025-09-07T06:13:52.8008890Z * [new branch] gh/anshul-si/16/orig -> origin/gh/anshul-si/16/orig 2025-09-07T06:13:52.8010736Z * [new branch] gh/anshul-si/17/base -> origin/gh/anshul-si/17/base 2025-09-07T06:13:52.8012190Z * [new branch] gh/anshul-si/17/head -> origin/gh/anshul-si/17/head 2025-09-07T06:13:52.8013639Z * [new branch] gh/anshul-si/17/orig -> origin/gh/anshul-si/17/orig 2025-09-07T06:13:52.8015451Z * [new branch] gh/anshul-si/18/base -> origin/gh/anshul-si/18/base 2025-09-07T06:13:52.8016874Z * [new branch] gh/anshul-si/18/head -> origin/gh/anshul-si/18/head 2025-09-07T06:13:52.8017863Z * [new branch] gh/anshul-si/18/orig -> origin/gh/anshul-si/18/orig 2025-09-07T06:13:52.8019639Z * [new branch] gh/anshul-si/19/base -> origin/gh/anshul-si/19/base 2025-09-07T06:13:52.8020816Z * [new branch] gh/anshul-si/19/head -> origin/gh/anshul-si/19/head 2025-09-07T06:13:52.8021983Z * [new branch] gh/anshul-si/19/orig -> origin/gh/anshul-si/19/orig 2025-09-07T06:13:52.8023900Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-09-07T06:13:52.8024619Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-09-07T06:13:52.8026514Z * [new branch] gh/anshul-si/20/base -> origin/gh/anshul-si/20/base 2025-09-07T06:13:52.8027595Z * [new branch] gh/anshul-si/20/head -> origin/gh/anshul-si/20/head 2025-09-07T06:13:52.8028731Z * [new branch] gh/anshul-si/20/orig -> origin/gh/anshul-si/20/orig 2025-09-07T06:13:52.8030310Z * [new branch] gh/anshul-si/21/base -> origin/gh/anshul-si/21/base 2025-09-07T06:13:52.8031369Z * [new branch] gh/anshul-si/21/head -> origin/gh/anshul-si/21/head 2025-09-07T06:13:52.8032532Z * [new branch] gh/anshul-si/21/orig -> origin/gh/anshul-si/21/orig 2025-09-07T06:13:52.8034227Z * [new branch] gh/anshul-si/22/base -> origin/gh/anshul-si/22/base 2025-09-07T06:13:52.8035303Z * [new branch] gh/anshul-si/22/head -> origin/gh/anshul-si/22/head 2025-09-07T06:13:52.8036453Z * [new branch] gh/anshul-si/22/orig -> origin/gh/anshul-si/22/orig 2025-09-07T06:13:52.8037902Z * [new branch] gh/anshul-si/23/base -> origin/gh/anshul-si/23/base 2025-09-07T06:13:52.8039096Z * [new branch] gh/anshul-si/23/head -> origin/gh/anshul-si/23/head 2025-09-07T06:13:52.8040216Z * [new branch] gh/anshul-si/23/orig -> origin/gh/anshul-si/23/orig 2025-09-07T06:13:52.8041840Z * [new branch] gh/anshul-si/24/base -> origin/gh/anshul-si/24/base 2025-09-07T06:13:52.8042990Z * [new branch] gh/anshul-si/24/head -> origin/gh/anshul-si/24/head 2025-09-07T06:13:52.8044160Z * [new branch] gh/anshul-si/24/orig -> origin/gh/anshul-si/24/orig 2025-09-07T06:13:52.8045797Z * [new branch] gh/anshul-si/25/base -> origin/gh/anshul-si/25/base 2025-09-07T06:13:52.8046964Z * [new branch] gh/anshul-si/25/head -> origin/gh/anshul-si/25/head 2025-09-07T06:13:52.8048143Z * [new branch] gh/anshul-si/25/orig -> origin/gh/anshul-si/25/orig 2025-09-07T06:13:52.8050208Z * [new branch] gh/anshul-si/26/base -> origin/gh/anshul-si/26/base 2025-09-07T06:13:52.8051384Z * [new branch] gh/anshul-si/26/head -> origin/gh/anshul-si/26/head 2025-09-07T06:13:52.8052630Z * [new branch] gh/anshul-si/26/orig -> origin/gh/anshul-si/26/orig 2025-09-07T06:13:52.8054398Z * [new branch] gh/anshul-si/27/base -> origin/gh/anshul-si/27/base 2025-09-07T06:13:52.8055547Z * [new branch] gh/anshul-si/27/head -> origin/gh/anshul-si/27/head 2025-09-07T06:13:52.8056717Z * [new branch] gh/anshul-si/27/orig -> origin/gh/anshul-si/27/orig 2025-09-07T06:13:52.8058284Z * [new branch] gh/anshul-si/28/base -> origin/gh/anshul-si/28/base 2025-09-07T06:13:52.8059418Z * [new branch] gh/anshul-si/28/head -> origin/gh/anshul-si/28/head 2025-09-07T06:13:52.8060562Z * [new branch] gh/anshul-si/28/orig -> origin/gh/anshul-si/28/orig 2025-09-07T06:13:52.8062107Z * [new branch] gh/anshul-si/29/base -> origin/gh/anshul-si/29/base 2025-09-07T06:13:52.8063514Z * [new branch] gh/anshul-si/29/head -> origin/gh/anshul-si/29/head 2025-09-07T06:13:52.8064670Z * [new branch] gh/anshul-si/29/orig -> origin/gh/anshul-si/29/orig 2025-09-07T06:13:52.8066276Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-09-07T06:13:52.8067271Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-09-07T06:13:52.8068748Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-09-07T06:13:52.8069905Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-09-07T06:13:52.8071684Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-09-07T06:13:52.8072676Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-09-07T06:13:52.8074761Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-09-07T06:13:52.8075808Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-09-07T06:13:52.8077891Z * [new branch] gh/bdhirsh/650/base -> origin/gh/bdhirsh/650/base 2025-09-07T06:13:52.8079139Z * [new branch] gh/bdhirsh/650/head -> origin/gh/bdhirsh/650/head 2025-09-07T06:13:52.8080312Z * [new branch] gh/bdhirsh/650/orig -> origin/gh/bdhirsh/650/orig 2025-09-07T06:13:52.8081928Z * [new branch] gh/bdhirsh/663/base -> origin/gh/bdhirsh/663/base 2025-09-07T06:13:52.8083062Z * [new branch] gh/bdhirsh/663/head -> origin/gh/bdhirsh/663/head 2025-09-07T06:13:52.8084183Z * [new branch] gh/bdhirsh/663/orig -> origin/gh/bdhirsh/663/orig 2025-09-07T06:13:52.8085945Z * [new branch] gh/bdhirsh/665/base -> origin/gh/bdhirsh/665/base 2025-09-07T06:13:52.8086987Z * [new branch] gh/bdhirsh/665/head -> origin/gh/bdhirsh/665/head 2025-09-07T06:13:52.8088117Z * [new branch] gh/bdhirsh/665/orig -> origin/gh/bdhirsh/665/orig 2025-09-07T06:13:52.8090184Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-09-07T06:13:52.8091308Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-09-07T06:13:52.8092879Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-09-07T06:13:52.8094833Z * [new branch] gh/bdhirsh/667/base -> origin/gh/bdhirsh/667/base 2025-09-07T06:13:52.8095935Z * [new branch] gh/bdhirsh/667/head -> origin/gh/bdhirsh/667/head 2025-09-07T06:13:52.8097109Z * [new branch] gh/bdhirsh/667/orig -> origin/gh/bdhirsh/667/orig 2025-09-07T06:13:52.8098759Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-09-07T06:13:52.8099886Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-09-07T06:13:52.8101032Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-09-07T06:13:52.8102872Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-09-07T06:13:52.8104030Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-09-07T06:13:52.8105138Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-09-07T06:13:52.8106994Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-09-07T06:13:52.8108196Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-09-07T06:13:52.8109406Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-09-07T06:13:52.8111502Z * [new branch] gh/benjaminglass1/100/base -> origin/gh/benjaminglass1/100/base 2025-09-07T06:13:52.8112549Z * [new branch] gh/benjaminglass1/100/head -> origin/gh/benjaminglass1/100/head 2025-09-07T06:13:52.8113779Z * [new branch] gh/benjaminglass1/100/orig -> origin/gh/benjaminglass1/100/orig 2025-09-07T06:13:52.8115443Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-09-07T06:13:52.8116520Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-09-07T06:13:52.8117674Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-09-07T06:13:52.8119411Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-09-07T06:13:52.8120444Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-09-07T06:13:52.8121550Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-09-07T06:13:52.8123138Z * [new branch] gh/benjaminglass1/103/base -> origin/gh/benjaminglass1/103/base 2025-09-07T06:13:52.8124188Z * [new branch] gh/benjaminglass1/103/head -> origin/gh/benjaminglass1/103/head 2025-09-07T06:13:52.8125304Z * [new branch] gh/benjaminglass1/103/orig -> origin/gh/benjaminglass1/103/orig 2025-09-07T06:13:52.8126901Z * [new branch] gh/benjaminglass1/104/base -> origin/gh/benjaminglass1/104/base 2025-09-07T06:13:52.8127932Z * [new branch] gh/benjaminglass1/104/head -> origin/gh/benjaminglass1/104/head 2025-09-07T06:13:52.8129088Z * [new branch] gh/benjaminglass1/104/orig -> origin/gh/benjaminglass1/104/orig 2025-09-07T06:13:52.8130679Z * [new branch] gh/benjaminglass1/105/base -> origin/gh/benjaminglass1/105/base 2025-09-07T06:13:52.8132029Z * [new branch] gh/benjaminglass1/105/head -> origin/gh/benjaminglass1/105/head 2025-09-07T06:13:52.8133324Z * [new branch] gh/benjaminglass1/105/orig -> origin/gh/benjaminglass1/105/orig 2025-09-07T06:13:52.8135007Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-09-07T06:13:52.8136209Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-09-07T06:13:52.8137364Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-09-07T06:13:52.8139018Z * [new branch] gh/benjaminglass1/79/base -> origin/gh/benjaminglass1/79/base 2025-09-07T06:13:52.8140129Z * [new branch] gh/benjaminglass1/79/head -> origin/gh/benjaminglass1/79/head 2025-09-07T06:13:52.8141263Z * [new branch] gh/benjaminglass1/79/orig -> origin/gh/benjaminglass1/79/orig 2025-09-07T06:13:52.8142918Z * [new branch] gh/benjaminglass1/86/base -> origin/gh/benjaminglass1/86/base 2025-09-07T06:13:52.8144101Z * [new branch] gh/benjaminglass1/86/head -> origin/gh/benjaminglass1/86/head 2025-09-07T06:13:52.8145323Z * [new branch] gh/benjaminglass1/86/orig -> origin/gh/benjaminglass1/86/orig 2025-09-07T06:13:52.8146891Z * [new branch] gh/benjaminglass1/89/base -> origin/gh/benjaminglass1/89/base 2025-09-07T06:13:52.8147928Z * [new branch] gh/benjaminglass1/89/head -> origin/gh/benjaminglass1/89/head 2025-09-07T06:13:52.8149402Z * [new branch] gh/benjaminglass1/89/orig -> origin/gh/benjaminglass1/89/orig 2025-09-07T06:13:52.8151159Z * [new branch] gh/benjaminglass1/91/base -> origin/gh/benjaminglass1/91/base 2025-09-07T06:13:52.8152232Z * [new branch] gh/benjaminglass1/91/head -> origin/gh/benjaminglass1/91/head 2025-09-07T06:13:52.8153396Z * [new branch] gh/benjaminglass1/91/orig -> origin/gh/benjaminglass1/91/orig 2025-09-07T06:13:52.8155075Z * [new branch] gh/benjaminglass1/93/base -> origin/gh/benjaminglass1/93/base 2025-09-07T06:13:52.8156200Z * [new branch] gh/benjaminglass1/93/head -> origin/gh/benjaminglass1/93/head 2025-09-07T06:13:52.8157421Z * [new branch] gh/benjaminglass1/93/orig -> origin/gh/benjaminglass1/93/orig 2025-09-07T06:13:52.8159069Z * [new branch] gh/benjaminglass1/95/base -> origin/gh/benjaminglass1/95/base 2025-09-07T06:13:52.8160164Z * [new branch] gh/benjaminglass1/95/head -> origin/gh/benjaminglass1/95/head 2025-09-07T06:13:52.8161458Z * [new branch] gh/benjaminglass1/95/orig -> origin/gh/benjaminglass1/95/orig 2025-09-07T06:13:52.8163191Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-09-07T06:13:52.8164128Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-09-07T06:13:52.8165277Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-09-07T06:13:52.8166896Z * [new branch] gh/benjaminglass1/99/base -> origin/gh/benjaminglass1/99/base 2025-09-07T06:13:52.8167920Z * [new branch] gh/benjaminglass1/99/head -> origin/gh/benjaminglass1/99/head 2025-09-07T06:13:52.8169173Z * [new branch] gh/benjaminglass1/99/orig -> origin/gh/benjaminglass1/99/orig 2025-09-07T06:13:52.8173621Z * [new branch] gh/bobrenjc93/514/base -> origin/gh/bobrenjc93/514/base 2025-09-07T06:13:52.8174838Z * [new branch] gh/bobrenjc93/514/head -> origin/gh/bobrenjc93/514/head 2025-09-07T06:13:52.8175491Z * [new branch] gh/bobrenjc93/514/orig -> origin/gh/bobrenjc93/514/orig 2025-09-07T06:13:52.8175772Z * [new branch] gh/bobrenjc93/521/base -> origin/gh/bobrenjc93/521/base 2025-09-07T06:13:52.8176459Z * [new branch] gh/bobrenjc93/521/head -> origin/gh/bobrenjc93/521/head 2025-09-07T06:13:52.8177691Z * [new branch] gh/bobrenjc93/521/orig -> origin/gh/bobrenjc93/521/orig 2025-09-07T06:13:52.8179259Z * [new branch] gh/bobrenjc93/522/base -> origin/gh/bobrenjc93/522/base 2025-09-07T06:13:52.8180383Z * [new branch] gh/bobrenjc93/522/head -> origin/gh/bobrenjc93/522/head 2025-09-07T06:13:52.8181559Z * [new branch] gh/bobrenjc93/522/orig -> origin/gh/bobrenjc93/522/orig 2025-09-07T06:13:52.8183162Z * [new branch] gh/bobrenjc93/525/base -> origin/gh/bobrenjc93/525/base 2025-09-07T06:13:52.8184393Z * [new branch] gh/bobrenjc93/525/head -> origin/gh/bobrenjc93/525/head 2025-09-07T06:13:52.8185545Z * [new branch] gh/bobrenjc93/525/orig -> origin/gh/bobrenjc93/525/orig 2025-09-07T06:13:52.8187087Z * [new branch] gh/bobrenjc93/526/base -> origin/gh/bobrenjc93/526/base 2025-09-07T06:13:52.8188199Z * [new branch] gh/bobrenjc93/526/head -> origin/gh/bobrenjc93/526/head 2025-09-07T06:13:52.8189344Z * [new branch] gh/bobrenjc93/526/orig -> origin/gh/bobrenjc93/526/orig 2025-09-07T06:13:52.8190831Z * [new branch] gh/bobrenjc93/527/base -> origin/gh/bobrenjc93/527/base 2025-09-07T06:13:52.8191954Z * [new branch] gh/bobrenjc93/527/head -> origin/gh/bobrenjc93/527/head 2025-09-07T06:13:52.8193086Z * [new branch] gh/bobrenjc93/527/orig -> origin/gh/bobrenjc93/527/orig 2025-09-07T06:13:52.8194600Z * [new branch] gh/bobrenjc93/528/base -> origin/gh/bobrenjc93/528/base 2025-09-07T06:13:52.8196093Z * [new branch] gh/bobrenjc93/528/head -> origin/gh/bobrenjc93/528/head 2025-09-07T06:13:52.8196824Z * [new branch] gh/bobrenjc93/528/orig -> origin/gh/bobrenjc93/528/orig 2025-09-07T06:13:52.8198415Z * [new branch] gh/bobrenjc93/529/base -> origin/gh/bobrenjc93/529/base 2025-09-07T06:13:52.8199514Z * [new branch] gh/bobrenjc93/529/head -> origin/gh/bobrenjc93/529/head 2025-09-07T06:13:52.8200649Z * [new branch] gh/bobrenjc93/529/orig -> origin/gh/bobrenjc93/529/orig 2025-09-07T06:13:52.8202156Z * [new branch] gh/bobrenjc93/535/base -> origin/gh/bobrenjc93/535/base 2025-09-07T06:13:52.8203459Z * [new branch] gh/bobrenjc93/535/head -> origin/gh/bobrenjc93/535/head 2025-09-07T06:13:52.8204573Z * [new branch] gh/bobrenjc93/535/orig -> origin/gh/bobrenjc93/535/orig 2025-09-07T06:13:52.8206142Z * [new branch] gh/bobrenjc93/537/base -> origin/gh/bobrenjc93/537/base 2025-09-07T06:13:52.8207358Z * [new branch] gh/bobrenjc93/537/head -> origin/gh/bobrenjc93/537/head 2025-09-07T06:13:52.8208555Z * [new branch] gh/bobrenjc93/537/orig -> origin/gh/bobrenjc93/537/orig 2025-09-07T06:13:52.8210287Z * [new branch] gh/bobrenjc93/539/base -> origin/gh/bobrenjc93/539/base 2025-09-07T06:13:52.8211539Z * [new branch] gh/bobrenjc93/539/head -> origin/gh/bobrenjc93/539/head 2025-09-07T06:13:52.8213023Z * [new branch] gh/bobrenjc93/539/orig -> origin/gh/bobrenjc93/539/orig 2025-09-07T06:13:52.8214740Z * [new branch] gh/bobrenjc93/540/base -> origin/gh/bobrenjc93/540/base 2025-09-07T06:13:52.8215996Z * [new branch] gh/bobrenjc93/540/head -> origin/gh/bobrenjc93/540/head 2025-09-07T06:13:52.8217152Z * [new branch] gh/bobrenjc93/540/orig -> origin/gh/bobrenjc93/540/orig 2025-09-07T06:13:52.8218747Z * [new branch] gh/bobrenjc93/541/base -> origin/gh/bobrenjc93/541/base 2025-09-07T06:13:52.8219927Z * [new branch] gh/bobrenjc93/541/head -> origin/gh/bobrenjc93/541/head 2025-09-07T06:13:52.8221093Z * [new branch] gh/bobrenjc93/541/orig -> origin/gh/bobrenjc93/541/orig 2025-09-07T06:13:52.8222517Z * [new branch] gh/bobrenjc93/542/base -> origin/gh/bobrenjc93/542/base 2025-09-07T06:13:52.8223663Z * [new branch] gh/bobrenjc93/542/head -> origin/gh/bobrenjc93/542/head 2025-09-07T06:13:52.8224930Z * [new branch] gh/bobrenjc93/542/orig -> origin/gh/bobrenjc93/542/orig 2025-09-07T06:13:52.8226479Z * [new branch] gh/bobrenjc93/543/base -> origin/gh/bobrenjc93/543/base 2025-09-07T06:13:52.8227573Z * [new branch] gh/bobrenjc93/543/head -> origin/gh/bobrenjc93/543/head 2025-09-07T06:13:52.8228712Z * [new branch] gh/bobrenjc93/543/orig -> origin/gh/bobrenjc93/543/orig 2025-09-07T06:13:52.8230122Z * [new branch] gh/bobrenjc93/544/base -> origin/gh/bobrenjc93/544/base 2025-09-07T06:13:52.8231248Z * [new branch] gh/bobrenjc93/544/head -> origin/gh/bobrenjc93/544/head 2025-09-07T06:13:52.8232361Z * [new branch] gh/bobrenjc93/544/orig -> origin/gh/bobrenjc93/544/orig 2025-09-07T06:13:52.8234258Z * [new branch] gh/bobrenjc93/545/base -> origin/gh/bobrenjc93/545/base 2025-09-07T06:13:52.8235552Z * [new branch] gh/bobrenjc93/545/head -> origin/gh/bobrenjc93/545/head 2025-09-07T06:13:52.8236737Z * [new branch] gh/bobrenjc93/545/orig -> origin/gh/bobrenjc93/545/orig 2025-09-07T06:13:52.8238406Z * [new branch] gh/bobrenjc93/546/base -> origin/gh/bobrenjc93/546/base 2025-09-07T06:13:52.8239500Z * [new branch] gh/bobrenjc93/546/head -> origin/gh/bobrenjc93/546/head 2025-09-07T06:13:52.8240652Z * [new branch] gh/bobrenjc93/546/orig -> origin/gh/bobrenjc93/546/orig 2025-09-07T06:13:52.8242821Z * [new branch] gh/bobrenjc93/547/base -> origin/gh/bobrenjc93/547/base 2025-09-07T06:13:52.8244002Z * [new branch] gh/bobrenjc93/547/head -> origin/gh/bobrenjc93/547/head 2025-09-07T06:13:52.8245207Z * [new branch] gh/bobrenjc93/547/orig -> origin/gh/bobrenjc93/547/orig 2025-09-07T06:13:52.8246668Z * [new branch] gh/bobrenjc93/548/base -> origin/gh/bobrenjc93/548/base 2025-09-07T06:13:52.8247764Z * [new branch] gh/bobrenjc93/548/head -> origin/gh/bobrenjc93/548/head 2025-09-07T06:13:52.8249036Z * [new branch] gh/bobrenjc93/548/orig -> origin/gh/bobrenjc93/548/orig 2025-09-07T06:13:52.8250843Z * [new branch] gh/bobrenjc93/549/base -> origin/gh/bobrenjc93/549/base 2025-09-07T06:13:52.8252230Z * [new branch] gh/bobrenjc93/549/head -> origin/gh/bobrenjc93/549/head 2025-09-07T06:13:52.8253467Z * [new branch] gh/bobrenjc93/549/orig -> origin/gh/bobrenjc93/549/orig 2025-09-07T06:13:52.8255601Z * [new branch] gh/bobrenjc93/550/base -> origin/gh/bobrenjc93/550/base 2025-09-07T06:13:52.8256562Z * [new branch] gh/bobrenjc93/550/head -> origin/gh/bobrenjc93/550/head 2025-09-07T06:13:52.8257840Z * [new branch] gh/bobrenjc93/550/orig -> origin/gh/bobrenjc93/550/orig 2025-09-07T06:13:52.8259698Z * [new branch] gh/bobrenjc93/551/base -> origin/gh/bobrenjc93/551/base 2025-09-07T06:13:52.8260995Z * [new branch] gh/bobrenjc93/551/head -> origin/gh/bobrenjc93/551/head 2025-09-07T06:13:52.8262169Z * [new branch] gh/bobrenjc93/551/orig -> origin/gh/bobrenjc93/551/orig 2025-09-07T06:13:52.8263925Z * [new branch] gh/bobrenjc93/552/base -> origin/gh/bobrenjc93/552/base 2025-09-07T06:13:52.8265155Z * [new branch] gh/bobrenjc93/552/head -> origin/gh/bobrenjc93/552/head 2025-09-07T06:13:52.8266296Z * [new branch] gh/bobrenjc93/552/orig -> origin/gh/bobrenjc93/552/orig 2025-09-07T06:13:52.8267737Z * [new branch] gh/bobrenjc93/553/base -> origin/gh/bobrenjc93/553/base 2025-09-07T06:13:52.8268865Z * [new branch] gh/bobrenjc93/553/head -> origin/gh/bobrenjc93/553/head 2025-09-07T06:13:52.8270151Z * [new branch] gh/bobrenjc93/553/orig -> origin/gh/bobrenjc93/553/orig 2025-09-07T06:13:52.8271592Z * [new branch] gh/bobrenjc93/554/base -> origin/gh/bobrenjc93/554/base 2025-09-07T06:13:52.8272724Z * [new branch] gh/bobrenjc93/554/head -> origin/gh/bobrenjc93/554/head 2025-09-07T06:13:52.8273843Z * [new branch] gh/bobrenjc93/554/orig -> origin/gh/bobrenjc93/554/orig 2025-09-07T06:13:52.8275530Z * [new branch] gh/bobrenjc93/555/base -> origin/gh/bobrenjc93/555/base 2025-09-07T06:13:52.8276602Z * [new branch] gh/bobrenjc93/555/head -> origin/gh/bobrenjc93/555/head 2025-09-07T06:13:52.8277715Z * [new branch] gh/bobrenjc93/555/orig -> origin/gh/bobrenjc93/555/orig 2025-09-07T06:13:52.8279315Z * [new branch] gh/bobrenjc93/556/base -> origin/gh/bobrenjc93/556/base 2025-09-07T06:13:52.8280405Z * [new branch] gh/bobrenjc93/556/head -> origin/gh/bobrenjc93/556/head 2025-09-07T06:13:52.8281536Z * [new branch] gh/bobrenjc93/556/orig -> origin/gh/bobrenjc93/556/orig 2025-09-07T06:13:52.8283440Z * [new branch] gh/briancoutinho/2/base -> origin/gh/briancoutinho/2/base 2025-09-07T06:13:52.8284641Z * [new branch] gh/briancoutinho/2/head -> origin/gh/briancoutinho/2/head 2025-09-07T06:13:52.8286509Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-09-07T06:13:52.8287681Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-09-07T06:13:52.8289260Z * [new branch] gh/c00w/48/base -> origin/gh/c00w/48/base 2025-09-07T06:13:52.8290441Z * [new branch] gh/c00w/48/head -> origin/gh/c00w/48/head 2025-09-07T06:13:52.8291702Z * [new branch] gh/c00w/48/orig -> origin/gh/c00w/48/orig 2025-09-07T06:13:52.8293609Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-09-07T06:13:52.8294691Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-09-07T06:13:52.8295860Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-09-07T06:13:52.8297297Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-09-07T06:13:52.8298549Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-09-07T06:13:52.8299769Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-09-07T06:13:52.8301340Z * [new branch] gh/c00w/55/base -> origin/gh/c00w/55/base 2025-09-07T06:13:52.8302747Z * [new branch] gh/c00w/55/head -> origin/gh/c00w/55/head 2025-09-07T06:13:52.8304323Z * [new branch] gh/c00w/55/orig -> origin/gh/c00w/55/orig 2025-09-07T06:13:52.8305860Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-09-07T06:13:52.8307059Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-09-07T06:13:52.8308259Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-09-07T06:13:52.8310139Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-09-07T06:13:52.8311402Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-09-07T06:13:52.8312529Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-09-07T06:13:52.8314446Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-09-07T06:13:52.8315725Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-09-07T06:13:52.8317467Z * [new branch] gh/coconutruben/11/base -> origin/gh/coconutruben/11/base 2025-09-07T06:13:52.8318715Z * [new branch] gh/coconutruben/11/head -> origin/gh/coconutruben/11/head 2025-09-07T06:13:52.8319933Z * [new branch] gh/coconutruben/11/orig -> origin/gh/coconutruben/11/orig 2025-09-07T06:13:52.8322116Z * [new branch] gh/coconutruben/12/base -> origin/gh/coconutruben/12/base 2025-09-07T06:13:52.8323658Z * [new branch] gh/coconutruben/12/head -> origin/gh/coconutruben/12/head 2025-09-07T06:13:52.8325049Z * [new branch] gh/coconutruben/12/orig -> origin/gh/coconutruben/12/orig 2025-09-07T06:13:52.8326688Z * [new branch] gh/coconutruben/13/base -> origin/gh/coconutruben/13/base 2025-09-07T06:13:52.8327972Z * [new branch] gh/coconutruben/13/head -> origin/gh/coconutruben/13/head 2025-09-07T06:13:52.8329189Z * [new branch] gh/coconutruben/13/orig -> origin/gh/coconutruben/13/orig 2025-09-07T06:13:52.8330831Z * [new branch] gh/coconutruben/14/base -> origin/gh/coconutruben/14/base 2025-09-07T06:13:52.8332445Z * [new branch] gh/coconutruben/14/head -> origin/gh/coconutruben/14/head 2025-09-07T06:13:52.8333610Z * [new branch] gh/coconutruben/14/orig -> origin/gh/coconutruben/14/orig 2025-09-07T06:13:52.8335511Z * [new branch] gh/coconutruben/15/base -> origin/gh/coconutruben/15/base 2025-09-07T06:13:52.8336866Z * [new branch] gh/coconutruben/15/head -> origin/gh/coconutruben/15/head 2025-09-07T06:13:52.8338166Z * [new branch] gh/coconutruben/15/orig -> origin/gh/coconutruben/15/orig 2025-09-07T06:13:52.8339759Z * [new branch] gh/coconutruben/16/base -> origin/gh/coconutruben/16/base 2025-09-07T06:13:52.8340983Z * [new branch] gh/coconutruben/16/head -> origin/gh/coconutruben/16/head 2025-09-07T06:13:52.8342127Z * [new branch] gh/coconutruben/16/orig -> origin/gh/coconutruben/16/orig 2025-09-07T06:13:52.8344307Z * [new branch] gh/coconutruben/17/base -> origin/gh/coconutruben/17/base 2025-09-07T06:13:52.8345725Z * [new branch] gh/coconutruben/17/head -> origin/gh/coconutruben/17/head 2025-09-07T06:13:52.8346962Z * [new branch] gh/coconutruben/17/orig -> origin/gh/coconutruben/17/orig 2025-09-07T06:13:52.8348597Z * [new branch] gh/coconutruben/18/base -> origin/gh/coconutruben/18/base 2025-09-07T06:13:52.8350386Z * [new branch] gh/coconutruben/18/head -> origin/gh/coconutruben/18/head 2025-09-07T06:13:52.8351511Z * [new branch] gh/coconutruben/18/orig -> origin/gh/coconutruben/18/orig 2025-09-07T06:13:52.8353397Z * [new branch] gh/coconutruben/19/base -> origin/gh/coconutruben/19/base 2025-09-07T06:13:52.8354797Z * [new branch] gh/coconutruben/19/head -> origin/gh/coconutruben/19/head 2025-09-07T06:13:52.8355984Z * [new branch] gh/coconutruben/19/orig -> origin/gh/coconutruben/19/orig 2025-09-07T06:13:52.8357734Z * [new branch] gh/coconutruben/20/base -> origin/gh/coconutruben/20/base 2025-09-07T06:13:52.8358977Z * [new branch] gh/coconutruben/20/head -> origin/gh/coconutruben/20/head 2025-09-07T06:13:52.8360218Z * [new branch] gh/coconutruben/20/orig -> origin/gh/coconutruben/20/orig 2025-09-07T06:13:52.8362054Z * [new branch] gh/coconutruben/21/base -> origin/gh/coconutruben/21/base 2025-09-07T06:13:52.8363138Z * [new branch] gh/coconutruben/21/head -> origin/gh/coconutruben/21/head 2025-09-07T06:13:52.8364274Z * [new branch] gh/coconutruben/21/orig -> origin/gh/coconutruben/21/orig 2025-09-07T06:13:52.8365905Z * [new branch] gh/coconutruben/22/base -> origin/gh/coconutruben/22/base 2025-09-07T06:13:52.8367018Z * [new branch] gh/coconutruben/22/head -> origin/gh/coconutruben/22/head 2025-09-07T06:13:52.8368345Z * [new branch] gh/coconutruben/22/orig -> origin/gh/coconutruben/22/orig 2025-09-07T06:13:52.8370054Z * [new branch] gh/coconutruben/24/base -> origin/gh/coconutruben/24/base 2025-09-07T06:13:52.8371423Z * [new branch] gh/coconutruben/24/head -> origin/gh/coconutruben/24/head 2025-09-07T06:13:52.8372926Z * [new branch] gh/coconutruben/24/orig -> origin/gh/coconutruben/24/orig 2025-09-07T06:13:52.8374993Z * [new branch] gh/coconutruben/25/base -> origin/gh/coconutruben/25/base 2025-09-07T06:13:52.8376700Z * [new branch] gh/coconutruben/25/head -> origin/gh/coconutruben/25/head 2025-09-07T06:13:52.8378192Z * [new branch] gh/coconutruben/25/orig -> origin/gh/coconutruben/25/orig 2025-09-07T06:13:52.8379856Z * [new branch] gh/coconutruben/28/base -> origin/gh/coconutruben/28/base 2025-09-07T06:13:52.8381033Z * [new branch] gh/coconutruben/28/head -> origin/gh/coconutruben/28/head 2025-09-07T06:13:52.8382252Z * [new branch] gh/coconutruben/28/orig -> origin/gh/coconutruben/28/orig 2025-09-07T06:13:52.8384136Z * [new branch] gh/coconutruben/29/base -> origin/gh/coconutruben/29/base 2025-09-07T06:13:52.8385354Z * [new branch] gh/coconutruben/29/head -> origin/gh/coconutruben/29/head 2025-09-07T06:13:52.8386590Z * [new branch] gh/coconutruben/29/orig -> origin/gh/coconutruben/29/orig 2025-09-07T06:13:52.8388265Z * [new branch] gh/coconutruben/30/base -> origin/gh/coconutruben/30/base 2025-09-07T06:13:52.8389552Z * [new branch] gh/coconutruben/30/head -> origin/gh/coconutruben/30/head 2025-09-07T06:13:52.8390769Z * [new branch] gh/coconutruben/30/orig -> origin/gh/coconutruben/30/orig 2025-09-07T06:13:52.8392924Z * [new branch] gh/coconutruben/31/base -> origin/gh/coconutruben/31/base 2025-09-07T06:13:52.8394167Z * [new branch] gh/coconutruben/31/head -> origin/gh/coconutruben/31/head 2025-09-07T06:13:52.8395413Z * [new branch] gh/coconutruben/31/orig -> origin/gh/coconutruben/31/orig 2025-09-07T06:13:52.8397255Z * [new branch] gh/coconutruben/32/base -> origin/gh/coconutruben/32/base 2025-09-07T06:13:52.8398532Z * [new branch] gh/coconutruben/32/head -> origin/gh/coconutruben/32/head 2025-09-07T06:13:52.8399767Z * [new branch] gh/coconutruben/32/orig -> origin/gh/coconutruben/32/orig 2025-09-07T06:13:52.8401610Z * [new branch] gh/coconutruben/33/base -> origin/gh/coconutruben/33/base 2025-09-07T06:13:52.8402799Z * [new branch] gh/coconutruben/33/head -> origin/gh/coconutruben/33/head 2025-09-07T06:13:52.8404099Z * [new branch] gh/coconutruben/33/orig -> origin/gh/coconutruben/33/orig 2025-09-07T06:13:52.8405572Z * [new branch] gh/coconutruben/34/base -> origin/gh/coconutruben/34/base 2025-09-07T06:13:52.8406672Z * [new branch] gh/coconutruben/34/head -> origin/gh/coconutruben/34/head 2025-09-07T06:13:52.8407799Z * [new branch] gh/coconutruben/34/orig -> origin/gh/coconutruben/34/orig 2025-09-07T06:13:52.8409394Z * [new branch] gh/coconutruben/35/base -> origin/gh/coconutruben/35/base 2025-09-07T06:13:52.8410579Z * [new branch] gh/coconutruben/35/head -> origin/gh/coconutruben/35/head 2025-09-07T06:13:52.8411969Z * [new branch] gh/coconutruben/35/orig -> origin/gh/coconutruben/35/orig 2025-09-07T06:13:52.8415443Z * [new branch] gh/coconutruben/36/base -> origin/gh/coconutruben/36/base 2025-09-07T06:13:52.8417223Z * [new branch] gh/coconutruben/36/head -> origin/gh/coconutruben/36/head 2025-09-07T06:13:52.8419364Z * [new branch] gh/coconutruben/36/orig -> origin/gh/coconutruben/36/orig 2025-09-07T06:13:52.8421523Z * [new branch] gh/coconutruben/37/base -> origin/gh/coconutruben/37/base 2025-09-07T06:13:52.8422684Z * [new branch] gh/coconutruben/37/head -> origin/gh/coconutruben/37/head 2025-09-07T06:13:52.8423901Z * [new branch] gh/coconutruben/37/orig -> origin/gh/coconutruben/37/orig 2025-09-07T06:13:52.8425794Z * [new branch] gh/coconutruben/38/base -> origin/gh/coconutruben/38/base 2025-09-07T06:13:52.8427104Z * [new branch] gh/coconutruben/38/head -> origin/gh/coconutruben/38/head 2025-09-07T06:13:52.8428313Z * [new branch] gh/coconutruben/38/orig -> origin/gh/coconutruben/38/orig 2025-09-07T06:13:52.8430077Z * [new branch] gh/coconutruben/39/base -> origin/gh/coconutruben/39/base 2025-09-07T06:13:52.8431155Z * [new branch] gh/coconutruben/39/head -> origin/gh/coconutruben/39/head 2025-09-07T06:13:52.8432347Z * [new branch] gh/coconutruben/39/orig -> origin/gh/coconutruben/39/orig 2025-09-07T06:13:52.8434176Z * [new branch] gh/coconutruben/40/base -> origin/gh/coconutruben/40/base 2025-09-07T06:13:52.8435275Z * [new branch] gh/coconutruben/40/head -> origin/gh/coconutruben/40/head 2025-09-07T06:13:52.8436488Z * [new branch] gh/coconutruben/40/orig -> origin/gh/coconutruben/40/orig 2025-09-07T06:13:52.8438364Z * [new branch] gh/coconutruben/41/base -> origin/gh/coconutruben/41/base 2025-09-07T06:13:52.8439610Z * [new branch] gh/coconutruben/41/head -> origin/gh/coconutruben/41/head 2025-09-07T06:13:52.8440813Z * [new branch] gh/coconutruben/41/orig -> origin/gh/coconutruben/41/orig 2025-09-07T06:13:52.8442628Z * [new branch] gh/coconutruben/42/base -> origin/gh/coconutruben/42/base 2025-09-07T06:13:52.8443864Z * [new branch] gh/coconutruben/42/head -> origin/gh/coconutruben/42/head 2025-09-07T06:13:52.8445084Z * [new branch] gh/coconutruben/42/orig -> origin/gh/coconutruben/42/orig 2025-09-07T06:13:52.8446892Z * [new branch] gh/coconutruben/43/base -> origin/gh/coconutruben/43/base 2025-09-07T06:13:52.8448103Z * [new branch] gh/coconutruben/43/head -> origin/gh/coconutruben/43/head 2025-09-07T06:13:52.8449500Z * [new branch] gh/coconutruben/43/orig -> origin/gh/coconutruben/43/orig 2025-09-07T06:13:52.8451743Z * [new branch] gh/coconutruben/44/base -> origin/gh/coconutruben/44/base 2025-09-07T06:13:52.8453115Z * [new branch] gh/coconutruben/44/head -> origin/gh/coconutruben/44/head 2025-09-07T06:13:52.8454398Z * [new branch] gh/coconutruben/44/orig -> origin/gh/coconutruben/44/orig 2025-09-07T06:13:52.8456414Z * [new branch] gh/coconutruben/45/base -> origin/gh/coconutruben/45/base 2025-09-07T06:13:52.8457597Z * [new branch] gh/coconutruben/45/head -> origin/gh/coconutruben/45/head 2025-09-07T06:13:52.8458840Z * [new branch] gh/coconutruben/45/orig -> origin/gh/coconutruben/45/orig 2025-09-07T06:13:52.8460510Z * [new branch] gh/coconutruben/46/base -> origin/gh/coconutruben/46/base 2025-09-07T06:13:52.8461732Z * [new branch] gh/coconutruben/46/head -> origin/gh/coconutruben/46/head 2025-09-07T06:13:52.8463083Z * [new branch] gh/coconutruben/46/orig -> origin/gh/coconutruben/46/orig 2025-09-07T06:13:52.8464905Z * [new branch] gh/coconutruben/47/base -> origin/gh/coconutruben/47/base 2025-09-07T06:13:52.8466163Z * [new branch] gh/coconutruben/47/head -> origin/gh/coconutruben/47/head 2025-09-07T06:13:52.8467381Z * [new branch] gh/coconutruben/47/orig -> origin/gh/coconutruben/47/orig 2025-09-07T06:13:52.8469253Z * [new branch] gh/coconutruben/48/base -> origin/gh/coconutruben/48/base 2025-09-07T06:13:52.8470472Z * [new branch] gh/coconutruben/48/head -> origin/gh/coconutruben/48/head 2025-09-07T06:13:52.8471651Z * [new branch] gh/coconutruben/48/orig -> origin/gh/coconutruben/48/orig 2025-09-07T06:13:52.8473616Z * [new branch] gh/coconutruben/49/base -> origin/gh/coconutruben/49/base 2025-09-07T06:13:52.8474809Z * [new branch] gh/coconutruben/49/head -> origin/gh/coconutruben/49/head 2025-09-07T06:13:52.8476018Z * [new branch] gh/coconutruben/49/orig -> origin/gh/coconutruben/49/orig 2025-09-07T06:13:52.8477765Z * [new branch] gh/coconutruben/50/base -> origin/gh/coconutruben/50/base 2025-09-07T06:13:52.8479043Z * [new branch] gh/coconutruben/50/head -> origin/gh/coconutruben/50/head 2025-09-07T06:13:52.8480285Z * [new branch] gh/coconutruben/50/orig -> origin/gh/coconutruben/50/orig 2025-09-07T06:13:52.8481963Z * [new branch] gh/coconutruben/51/base -> origin/gh/coconutruben/51/base 2025-09-07T06:13:52.8483143Z * [new branch] gh/coconutruben/51/head -> origin/gh/coconutruben/51/head 2025-09-07T06:13:52.8484375Z * [new branch] gh/coconutruben/51/orig -> origin/gh/coconutruben/51/orig 2025-09-07T06:13:52.8486255Z * [new branch] gh/coconutruben/52/base -> origin/gh/coconutruben/52/base 2025-09-07T06:13:52.8487467Z * [new branch] gh/coconutruben/52/head -> origin/gh/coconutruben/52/head 2025-09-07T06:13:52.8488733Z * [new branch] gh/coconutruben/52/orig -> origin/gh/coconutruben/52/orig 2025-09-07T06:13:52.8490456Z * [new branch] gh/coconutruben/53/base -> origin/gh/coconutruben/53/base 2025-09-07T06:13:52.8491644Z * [new branch] gh/coconutruben/53/head -> origin/gh/coconutruben/53/head 2025-09-07T06:13:52.8493124Z * [new branch] gh/coconutruben/53/orig -> origin/gh/coconutruben/53/orig 2025-09-07T06:13:52.8494996Z * [new branch] gh/coconutruben/54/base -> origin/gh/coconutruben/54/base 2025-09-07T06:13:52.8496231Z * [new branch] gh/coconutruben/54/head -> origin/gh/coconutruben/54/head 2025-09-07T06:13:52.8497531Z * [new branch] gh/coconutruben/54/orig -> origin/gh/coconutruben/54/orig 2025-09-07T06:13:52.8499264Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-09-07T06:13:52.8500468Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-09-07T06:13:52.8501740Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-09-07T06:13:52.8503523Z * [new branch] gh/coconutruben/56/base -> origin/gh/coconutruben/56/base 2025-09-07T06:13:52.8504899Z * [new branch] gh/coconutruben/56/head -> origin/gh/coconutruben/56/head 2025-09-07T06:13:52.8506092Z * [new branch] gh/coconutruben/56/orig -> origin/gh/coconutruben/56/orig 2025-09-07T06:13:52.8507834Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-09-07T06:13:52.8509160Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-09-07T06:13:52.8510359Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-09-07T06:13:52.8512421Z * [new branch] gh/coconutruben/58/base -> origin/gh/coconutruben/58/base 2025-09-07T06:13:52.8513752Z * [new branch] gh/coconutruben/58/head -> origin/gh/coconutruben/58/head 2025-09-07T06:13:52.8514942Z * [new branch] gh/coconutruben/58/orig -> origin/gh/coconutruben/58/orig 2025-09-07T06:13:52.8516599Z * [new branch] gh/coconutruben/59/base -> origin/gh/coconutruben/59/base 2025-09-07T06:13:52.8517681Z * [new branch] gh/coconutruben/59/head -> origin/gh/coconutruben/59/head 2025-09-07T06:13:52.8518756Z * [new branch] gh/coconutruben/59/orig -> origin/gh/coconutruben/59/orig 2025-09-07T06:13:52.8520439Z * [new branch] gh/coconutruben/60/base -> origin/gh/coconutruben/60/base 2025-09-07T06:13:52.8521713Z * [new branch] gh/coconutruben/60/head -> origin/gh/coconutruben/60/head 2025-09-07T06:13:52.8522931Z * [new branch] gh/coconutruben/60/orig -> origin/gh/coconutruben/60/orig 2025-09-07T06:13:52.8524616Z * [new branch] gh/coconutruben/61/base -> origin/gh/coconutruben/61/base 2025-09-07T06:13:52.8525973Z * [new branch] gh/coconutruben/61/head -> origin/gh/coconutruben/61/head 2025-09-07T06:13:52.8527120Z * [new branch] gh/coconutruben/61/orig -> origin/gh/coconutruben/61/orig 2025-09-07T06:13:52.8528924Z * [new branch] gh/coconutruben/62/base -> origin/gh/coconutruben/62/base 2025-09-07T06:13:52.8530105Z * [new branch] gh/coconutruben/62/head -> origin/gh/coconutruben/62/head 2025-09-07T06:13:52.8531427Z * [new branch] gh/coconutruben/62/orig -> origin/gh/coconutruben/62/orig 2025-09-07T06:13:52.8533518Z * [new branch] gh/coconutruben/63/base -> origin/gh/coconutruben/63/base 2025-09-07T06:13:52.8534744Z * [new branch] gh/coconutruben/63/head -> origin/gh/coconutruben/63/head 2025-09-07T06:13:52.8535967Z * [new branch] gh/coconutruben/63/orig -> origin/gh/coconutruben/63/orig 2025-09-07T06:13:52.8537654Z * [new branch] gh/coconutruben/64/base -> origin/gh/coconutruben/64/base 2025-09-07T06:13:52.8538953Z * [new branch] gh/coconutruben/64/head -> origin/gh/coconutruben/64/head 2025-09-07T06:13:52.8540161Z * [new branch] gh/coconutruben/64/orig -> origin/gh/coconutruben/64/orig 2025-09-07T06:13:52.8541887Z * [new branch] gh/coconutruben/65/base -> origin/gh/coconutruben/65/base 2025-09-07T06:13:52.8543102Z * [new branch] gh/coconutruben/65/head -> origin/gh/coconutruben/65/head 2025-09-07T06:13:52.8544380Z * [new branch] gh/coconutruben/65/orig -> origin/gh/coconutruben/65/orig 2025-09-07T06:13:52.8546118Z * [new branch] gh/coconutruben/66/base -> origin/gh/coconutruben/66/base 2025-09-07T06:13:52.8547354Z * [new branch] gh/coconutruben/66/head -> origin/gh/coconutruben/66/head 2025-09-07T06:13:52.8548236Z * [new branch] gh/coconutruben/66/orig -> origin/gh/coconutruben/66/orig 2025-09-07T06:13:52.8554314Z * [new branch] gh/codingwithsurya/12/base -> origin/gh/codingwithsurya/12/base 2025-09-07T06:13:52.8555689Z * [new branch] gh/codingwithsurya/12/head -> origin/gh/codingwithsurya/12/head 2025-09-07T06:13:52.8557193Z * [new branch] gh/codingwithsurya/12/orig -> origin/gh/codingwithsurya/12/orig 2025-09-07T06:13:52.8558621Z * [new branch] gh/codingwithsurya/14/base -> origin/gh/codingwithsurya/14/base 2025-09-07T06:13:52.8559833Z * [new branch] gh/codingwithsurya/14/head -> origin/gh/codingwithsurya/14/head 2025-09-07T06:13:52.8561142Z * [new branch] gh/codingwithsurya/14/orig -> origin/gh/codingwithsurya/14/orig 2025-09-07T06:13:52.8562923Z * [new branch] gh/codingwithsurya/15/base -> origin/gh/codingwithsurya/15/base 2025-09-07T06:13:52.8564149Z * [new branch] gh/codingwithsurya/15/head -> origin/gh/codingwithsurya/15/head 2025-09-07T06:13:52.8565319Z * [new branch] gh/codingwithsurya/15/orig -> origin/gh/codingwithsurya/15/orig 2025-09-07T06:13:52.8567103Z * [new branch] gh/codingwithsurya/16/base -> origin/gh/codingwithsurya/16/base 2025-09-07T06:13:52.8568307Z * [new branch] gh/codingwithsurya/16/head -> origin/gh/codingwithsurya/16/head 2025-09-07T06:13:52.8569444Z * [new branch] gh/codingwithsurya/16/orig -> origin/gh/codingwithsurya/16/orig 2025-09-07T06:13:52.8571582Z * [new branch] gh/codingwithsurya/17/base -> origin/gh/codingwithsurya/17/base 2025-09-07T06:13:52.8573143Z * [new branch] gh/codingwithsurya/17/head -> origin/gh/codingwithsurya/17/head 2025-09-07T06:13:52.8574305Z * [new branch] gh/codingwithsurya/17/orig -> origin/gh/codingwithsurya/17/orig 2025-09-07T06:13:52.8576043Z * [new branch] gh/codingwithsurya/18/base -> origin/gh/codingwithsurya/18/base 2025-09-07T06:13:52.8577276Z * [new branch] gh/codingwithsurya/18/head -> origin/gh/codingwithsurya/18/head 2025-09-07T06:13:52.8578448Z * [new branch] gh/codingwithsurya/18/orig -> origin/gh/codingwithsurya/18/orig 2025-09-07T06:13:52.8580260Z * [new branch] gh/codingwithsurya/19/base -> origin/gh/codingwithsurya/19/base 2025-09-07T06:13:52.8581514Z * [new branch] gh/codingwithsurya/19/head -> origin/gh/codingwithsurya/19/head 2025-09-07T06:13:52.8582661Z * [new branch] gh/codingwithsurya/19/orig -> origin/gh/codingwithsurya/19/orig 2025-09-07T06:13:52.8584446Z * [new branch] gh/codingwithsurya/20/base -> origin/gh/codingwithsurya/20/base 2025-09-07T06:13:52.8585579Z * [new branch] gh/codingwithsurya/20/head -> origin/gh/codingwithsurya/20/head 2025-09-07T06:13:52.8586709Z * [new branch] gh/codingwithsurya/20/orig -> origin/gh/codingwithsurya/20/orig 2025-09-07T06:13:52.8588498Z * [new branch] gh/codingwithsurya/21/base -> origin/gh/codingwithsurya/21/base 2025-09-07T06:13:52.8589690Z * [new branch] gh/codingwithsurya/21/head -> origin/gh/codingwithsurya/21/head 2025-09-07T06:13:52.8590835Z * [new branch] gh/codingwithsurya/21/orig -> origin/gh/codingwithsurya/21/orig 2025-09-07T06:13:52.8592710Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-09-07T06:13:52.8593827Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-09-07T06:13:52.8595202Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-09-07T06:13:52.8596306Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-09-07T06:13:52.8597677Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-09-07T06:13:52.8598735Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-09-07T06:13:52.8600139Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-09-07T06:13:52.8601248Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-09-07T06:13:52.8603265Z * [new branch] gh/davidberard98/382/base -> origin/gh/davidberard98/382/base 2025-09-07T06:13:52.8604656Z * [new branch] gh/davidberard98/382/head -> origin/gh/davidberard98/382/head 2025-09-07T06:13:52.8605717Z * [new branch] gh/davidberard98/382/orig -> origin/gh/davidberard98/382/orig 2025-09-07T06:13:52.8607261Z * [new branch] gh/davidberard98/386/base -> origin/gh/davidberard98/386/base 2025-09-07T06:13:52.8608432Z * [new branch] gh/davidberard98/386/head -> origin/gh/davidberard98/386/head 2025-09-07T06:13:52.8609580Z * [new branch] gh/davidberard98/386/orig -> origin/gh/davidberard98/386/orig 2025-09-07T06:13:52.8611168Z * [new branch] gh/davidberard98/391/base -> origin/gh/davidberard98/391/base 2025-09-07T06:13:52.8612542Z * [new branch] gh/davidberard98/391/head -> origin/gh/davidberard98/391/head 2025-09-07T06:13:52.8613737Z * [new branch] gh/davidberard98/391/orig -> origin/gh/davidberard98/391/orig 2025-09-07T06:13:52.8615376Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-09-07T06:13:52.8616494Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-09-07T06:13:52.8617719Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-09-07T06:13:52.8619509Z * [new branch] gh/davidberard98/394/base -> origin/gh/davidberard98/394/base 2025-09-07T06:13:52.8620743Z * [new branch] gh/davidberard98/394/head -> origin/gh/davidberard98/394/head 2025-09-07T06:13:52.8621924Z * [new branch] gh/davidberard98/394/orig -> origin/gh/davidberard98/394/orig 2025-09-07T06:13:52.8623515Z * [new branch] gh/davidberard98/396/base -> origin/gh/davidberard98/396/base 2025-09-07T06:13:52.8624741Z * [new branch] gh/davidberard98/396/head -> origin/gh/davidberard98/396/head 2025-09-07T06:13:52.8625845Z * [new branch] gh/davidberard98/396/orig -> origin/gh/davidberard98/396/orig 2025-09-07T06:13:52.8627667Z * [new branch] gh/davidberard98/397/base -> origin/gh/davidberard98/397/base 2025-09-07T06:13:52.8628817Z * [new branch] gh/davidberard98/397/head -> origin/gh/davidberard98/397/head 2025-09-07T06:13:52.8630015Z * [new branch] gh/davidberard98/397/orig -> origin/gh/davidberard98/397/orig 2025-09-07T06:13:52.8631633Z * [new branch] gh/davidberard98/398/base -> origin/gh/davidberard98/398/base 2025-09-07T06:13:52.8633080Z * [new branch] gh/davidberard98/398/head -> origin/gh/davidberard98/398/head 2025-09-07T06:13:52.8633800Z * [new branch] gh/davidberard98/398/orig -> origin/gh/davidberard98/398/orig 2025-09-07T06:13:52.8635467Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-09-07T06:13:52.8636695Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-09-07T06:13:52.8638307Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-09-07T06:13:52.8640080Z * [new branch] gh/davidberard98/400/base -> origin/gh/davidberard98/400/base 2025-09-07T06:13:52.8641305Z * [new branch] gh/davidberard98/400/head -> origin/gh/davidberard98/400/head 2025-09-07T06:13:52.8642454Z * [new branch] gh/davidberard98/400/orig -> origin/gh/davidberard98/400/orig 2025-09-07T06:13:52.8643957Z * [new branch] gh/davidberard98/401/base -> origin/gh/davidberard98/401/base 2025-09-07T06:13:52.8645074Z * [new branch] gh/davidberard98/401/head -> origin/gh/davidberard98/401/head 2025-09-07T06:13:52.8646179Z * [new branch] gh/davidberard98/401/orig -> origin/gh/davidberard98/401/orig 2025-09-07T06:13:52.8647708Z * [new branch] gh/davidberard98/402/base -> origin/gh/davidberard98/402/base 2025-09-07T06:13:52.8649103Z * [new branch] gh/davidberard98/402/head -> origin/gh/davidberard98/402/head 2025-09-07T06:13:52.8650468Z * [new branch] gh/davidberard98/402/orig -> origin/gh/davidberard98/402/orig 2025-09-07T06:13:52.8652218Z * [new branch] gh/davidberard98/403/base -> origin/gh/davidberard98/403/base 2025-09-07T06:13:52.8653380Z * [new branch] gh/davidberard98/403/head -> origin/gh/davidberard98/403/head 2025-09-07T06:13:52.8654527Z * [new branch] gh/davidberard98/403/orig -> origin/gh/davidberard98/403/orig 2025-09-07T06:13:52.8656267Z * [new branch] gh/davidberard98/404/base -> origin/gh/davidberard98/404/base 2025-09-07T06:13:52.8657380Z * [new branch] gh/davidberard98/404/head -> origin/gh/davidberard98/404/head 2025-09-07T06:13:52.8658505Z * [new branch] gh/davidberard98/404/orig -> origin/gh/davidberard98/404/orig 2025-09-07T06:13:52.8660183Z * [new branch] gh/davidberard98/405/base -> origin/gh/davidberard98/405/base 2025-09-07T06:13:52.8661368Z * [new branch] gh/davidberard98/405/head -> origin/gh/davidberard98/405/head 2025-09-07T06:13:52.8662550Z * [new branch] gh/davidberard98/405/orig -> origin/gh/davidberard98/405/orig 2025-09-07T06:13:52.8664371Z * [new branch] gh/davidberard98/406/base -> origin/gh/davidberard98/406/base 2025-09-07T06:13:52.8665688Z * [new branch] gh/davidberard98/406/head -> origin/gh/davidberard98/406/head 2025-09-07T06:13:52.8666931Z * [new branch] gh/davidberard98/406/orig -> origin/gh/davidberard98/406/orig 2025-09-07T06:13:52.8669072Z * [new branch] gh/davidberard98/407/base -> origin/gh/davidberard98/407/base 2025-09-07T06:13:52.8670157Z * [new branch] gh/davidberard98/407/head -> origin/gh/davidberard98/407/head 2025-09-07T06:13:52.8671282Z * [new branch] gh/davidberard98/407/orig -> origin/gh/davidberard98/407/orig 2025-09-07T06:13:52.8672877Z * [new branch] gh/davidberard98/408/base -> origin/gh/davidberard98/408/base 2025-09-07T06:13:52.8673975Z * [new branch] gh/davidberard98/408/head -> origin/gh/davidberard98/408/head 2025-09-07T06:13:52.8675104Z * [new branch] gh/davidberard98/408/orig -> origin/gh/davidberard98/408/orig 2025-09-07T06:13:52.8676555Z * [new branch] gh/davidberard98/409/base -> origin/gh/davidberard98/409/base 2025-09-07T06:13:52.8677810Z * [new branch] gh/davidberard98/409/head -> origin/gh/davidberard98/409/head 2025-09-07T06:13:52.8679046Z * [new branch] gh/davidberard98/409/orig -> origin/gh/davidberard98/409/orig 2025-09-07T06:13:52.8680863Z * [new branch] gh/desertfire/594/base -> origin/gh/desertfire/594/base 2025-09-07T06:13:52.8681955Z * [new branch] gh/desertfire/594/head -> origin/gh/desertfire/594/head 2025-09-07T06:13:52.8683136Z * [new branch] gh/desertfire/594/orig -> origin/gh/desertfire/594/orig 2025-09-07T06:13:52.8684637Z * [new branch] gh/desertfire/595/base -> origin/gh/desertfire/595/base 2025-09-07T06:13:52.8685729Z * [new branch] gh/desertfire/595/head -> origin/gh/desertfire/595/head 2025-09-07T06:13:52.8686853Z * [new branch] gh/desertfire/595/orig -> origin/gh/desertfire/595/orig 2025-09-07T06:13:52.8688387Z * [new branch] gh/desertfire/597/base -> origin/gh/desertfire/597/base 2025-09-07T06:13:52.8689539Z * [new branch] gh/desertfire/597/head -> origin/gh/desertfire/597/head 2025-09-07T06:13:52.8690739Z * [new branch] gh/desertfire/597/orig -> origin/gh/desertfire/597/orig 2025-09-07T06:13:52.8692929Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-09-07T06:13:52.8694137Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-09-07T06:13:52.8696130Z * [new branch] gh/drisspg/149/base -> origin/gh/drisspg/149/base 2025-09-07T06:13:52.8697189Z * [new branch] gh/drisspg/149/head -> origin/gh/drisspg/149/head 2025-09-07T06:13:52.8698357Z * [new branch] gh/drisspg/149/orig -> origin/gh/drisspg/149/orig 2025-09-07T06:13:52.8699922Z * [new branch] gh/drisspg/159/base -> origin/gh/drisspg/159/base 2025-09-07T06:13:52.8701064Z * [new branch] gh/drisspg/159/head -> origin/gh/drisspg/159/head 2025-09-07T06:13:52.8702279Z * [new branch] gh/drisspg/159/orig -> origin/gh/drisspg/159/orig 2025-09-07T06:13:52.8703967Z * [new branch] gh/drisspg/166/base -> origin/gh/drisspg/166/base 2025-09-07T06:13:52.8705091Z * [new branch] gh/drisspg/166/head -> origin/gh/drisspg/166/head 2025-09-07T06:13:52.8706226Z * [new branch] gh/drisspg/166/orig -> origin/gh/drisspg/166/orig 2025-09-07T06:13:52.8707711Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-09-07T06:13:52.8709008Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-09-07T06:13:52.8710163Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-09-07T06:13:52.8711679Z * [new branch] gh/drisspg/173/base -> origin/gh/drisspg/173/base 2025-09-07T06:13:52.8712787Z * [new branch] gh/drisspg/173/head -> origin/gh/drisspg/173/head 2025-09-07T06:13:52.8713954Z * [new branch] gh/drisspg/173/orig -> origin/gh/drisspg/173/orig 2025-09-07T06:13:52.8715449Z * [new branch] gh/drisspg/177/base -> origin/gh/drisspg/177/base 2025-09-07T06:13:52.8716580Z * [new branch] gh/drisspg/177/head -> origin/gh/drisspg/177/head 2025-09-07T06:13:52.8717703Z * [new branch] gh/drisspg/177/orig -> origin/gh/drisspg/177/orig 2025-09-07T06:13:52.8719217Z * [new branch] gh/drisspg/178/base -> origin/gh/drisspg/178/base 2025-09-07T06:13:52.8720340Z * [new branch] gh/drisspg/178/head -> origin/gh/drisspg/178/head 2025-09-07T06:13:52.8721391Z * [new branch] gh/drisspg/178/orig -> origin/gh/drisspg/178/orig 2025-09-07T06:13:52.8722902Z * [new branch] gh/drisspg/180/base -> origin/gh/drisspg/180/base 2025-09-07T06:13:52.8724034Z * [new branch] gh/drisspg/180/head -> origin/gh/drisspg/180/head 2025-09-07T06:13:52.8725134Z * [new branch] gh/drisspg/180/orig -> origin/gh/drisspg/180/orig 2025-09-07T06:13:52.8726624Z * [new branch] gh/drisspg/181/base -> origin/gh/drisspg/181/base 2025-09-07T06:13:52.8727789Z * [new branch] gh/drisspg/181/head -> origin/gh/drisspg/181/head 2025-09-07T06:13:52.8728901Z * [new branch] gh/drisspg/181/orig -> origin/gh/drisspg/181/orig 2025-09-07T06:13:52.8730437Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-09-07T06:13:52.8732002Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-09-07T06:13:52.8733327Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-09-07T06:13:52.8734427Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-09-07T06:13:52.8735853Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-09-07T06:13:52.8736927Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-09-07T06:13:52.8738532Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-09-07T06:13:52.8739703Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-09-07T06:13:52.8741227Z * [new branch] gh/drisspg/186/base -> origin/gh/drisspg/186/base 2025-09-07T06:13:52.8742510Z * [new branch] gh/drisspg/186/head -> origin/gh/drisspg/186/head 2025-09-07T06:13:52.8743579Z * [new branch] gh/drisspg/186/orig -> origin/gh/drisspg/186/orig 2025-09-07T06:13:52.8745201Z * [new branch] gh/drisspg/187/base -> origin/gh/drisspg/187/base 2025-09-07T06:13:52.8746312Z * [new branch] gh/drisspg/187/head -> origin/gh/drisspg/187/head 2025-09-07T06:13:52.8747423Z * [new branch] gh/drisspg/187/orig -> origin/gh/drisspg/187/orig 2025-09-07T06:13:52.8749106Z * [new branch] gh/drisspg/188/base -> origin/gh/drisspg/188/base 2025-09-07T06:13:52.8752417Z * [new branch] gh/drisspg/188/head -> origin/gh/drisspg/188/head 2025-09-07T06:13:52.8753580Z * [new branch] gh/drisspg/188/orig -> origin/gh/drisspg/188/orig 2025-09-07T06:13:52.8755585Z * [new branch] gh/drisspg/189/base -> origin/gh/drisspg/189/base 2025-09-07T06:13:52.8756767Z * [new branch] gh/drisspg/189/head -> origin/gh/drisspg/189/head 2025-09-07T06:13:52.8757921Z * [new branch] gh/drisspg/189/orig -> origin/gh/drisspg/189/orig 2025-09-07T06:13:52.8759526Z * [new branch] gh/drisspg/190/base -> origin/gh/drisspg/190/base 2025-09-07T06:13:52.8760674Z * [new branch] gh/drisspg/190/head -> origin/gh/drisspg/190/head 2025-09-07T06:13:52.8761899Z * [new branch] gh/drisspg/190/orig -> origin/gh/drisspg/190/orig 2025-09-07T06:13:52.8763474Z * [new branch] gh/drisspg/191/base -> origin/gh/drisspg/191/base 2025-09-07T06:13:52.8764588Z * [new branch] gh/drisspg/191/head -> origin/gh/drisspg/191/head 2025-09-07T06:13:52.8765698Z * [new branch] gh/drisspg/191/orig -> origin/gh/drisspg/191/orig 2025-09-07T06:13:52.8767283Z * [new branch] gh/drisspg/192/base -> origin/gh/drisspg/192/base 2025-09-07T06:13:52.8768326Z * [new branch] gh/drisspg/192/head -> origin/gh/drisspg/192/head 2025-09-07T06:13:52.8769438Z * [new branch] gh/drisspg/192/orig -> origin/gh/drisspg/192/orig 2025-09-07T06:13:52.8771034Z * [new branch] gh/drisspg/193/base -> origin/gh/drisspg/193/base 2025-09-07T06:13:52.8772468Z * [new branch] gh/drisspg/193/head -> origin/gh/drisspg/193/head 2025-09-07T06:13:52.8773726Z * [new branch] gh/drisspg/193/orig -> origin/gh/drisspg/193/orig 2025-09-07T06:13:52.8775258Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-09-07T06:13:52.8776429Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-09-07T06:13:52.8777757Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-09-07T06:13:52.8779421Z * [new branch] gh/drisspg/195/base -> origin/gh/drisspg/195/base 2025-09-07T06:13:52.8780551Z * [new branch] gh/drisspg/195/head -> origin/gh/drisspg/195/head 2025-09-07T06:13:52.8781694Z * [new branch] gh/drisspg/195/orig -> origin/gh/drisspg/195/orig 2025-09-07T06:13:52.8783313Z * [new branch] gh/drisspg/196/base -> origin/gh/drisspg/196/base 2025-09-07T06:13:52.8784549Z * [new branch] gh/drisspg/196/head -> origin/gh/drisspg/196/head 2025-09-07T06:13:52.8785733Z * [new branch] gh/drisspg/196/orig -> origin/gh/drisspg/196/orig 2025-09-07T06:13:52.8787284Z * [new branch] gh/drisspg/197/base -> origin/gh/drisspg/197/base 2025-09-07T06:13:52.8788385Z * [new branch] gh/drisspg/197/head -> origin/gh/drisspg/197/head 2025-09-07T06:13:52.8789509Z * [new branch] gh/drisspg/197/orig -> origin/gh/drisspg/197/orig 2025-09-07T06:13:52.8791138Z * [new branch] gh/drisspg/198/base -> origin/gh/drisspg/198/base 2025-09-07T06:13:52.8792211Z * [new branch] gh/drisspg/198/head -> origin/gh/drisspg/198/head 2025-09-07T06:13:52.8793343Z * [new branch] gh/drisspg/198/orig -> origin/gh/drisspg/198/orig 2025-09-07T06:13:52.8794854Z * [new branch] gh/drisspg/199/base -> origin/gh/drisspg/199/base 2025-09-07T06:13:52.8795969Z * [new branch] gh/drisspg/199/head -> origin/gh/drisspg/199/head 2025-09-07T06:13:52.8797088Z * [new branch] gh/drisspg/199/orig -> origin/gh/drisspg/199/orig 2025-09-07T06:13:52.8798921Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-09-07T06:13:52.8800068Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-09-07T06:13:52.8801919Z * [new branch] gh/eellison/784/base -> origin/gh/eellison/784/base 2025-09-07T06:13:52.8803046Z * [new branch] gh/eellison/784/head -> origin/gh/eellison/784/head 2025-09-07T06:13:52.8804168Z * [new branch] gh/eellison/784/orig -> origin/gh/eellison/784/orig 2025-09-07T06:13:52.8805953Z * [new branch] gh/eellison/785/base -> origin/gh/eellison/785/base 2025-09-07T06:13:52.8807065Z * [new branch] gh/eellison/785/head -> origin/gh/eellison/785/head 2025-09-07T06:13:52.8808185Z * [new branch] gh/eellison/785/orig -> origin/gh/eellison/785/orig 2025-09-07T06:13:52.8809720Z * [new branch] gh/eellison/789/base -> origin/gh/eellison/789/base 2025-09-07T06:13:52.8810819Z * [new branch] gh/eellison/789/head -> origin/gh/eellison/789/head 2025-09-07T06:13:52.8812285Z * [new branch] gh/eellison/789/orig -> origin/gh/eellison/789/orig 2025-09-07T06:13:52.8813864Z * [new branch] gh/eellison/800/base -> origin/gh/eellison/800/base 2025-09-07T06:13:52.8815023Z * [new branch] gh/eellison/800/head -> origin/gh/eellison/800/head 2025-09-07T06:13:52.8816236Z * [new branch] gh/eellison/800/orig -> origin/gh/eellison/800/orig 2025-09-07T06:13:52.8817794Z * [new branch] gh/eellison/801/base -> origin/gh/eellison/801/base 2025-09-07T06:13:52.8818921Z * [new branch] gh/eellison/801/head -> origin/gh/eellison/801/head 2025-09-07T06:13:52.8820116Z * [new branch] gh/eellison/801/orig -> origin/gh/eellison/801/orig 2025-09-07T06:13:52.8821695Z * [new branch] gh/eellison/802/base -> origin/gh/eellison/802/base 2025-09-07T06:13:52.8822833Z * [new branch] gh/eellison/802/head -> origin/gh/eellison/802/head 2025-09-07T06:13:52.8824107Z * [new branch] gh/eellison/802/orig -> origin/gh/eellison/802/orig 2025-09-07T06:13:52.8825656Z * [new branch] gh/eellison/805/base -> origin/gh/eellison/805/base 2025-09-07T06:13:52.8826835Z * [new branch] gh/eellison/805/head -> origin/gh/eellison/805/head 2025-09-07T06:13:52.8827952Z * [new branch] gh/eellison/805/orig -> origin/gh/eellison/805/orig 2025-09-07T06:13:52.8829576Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-09-07T06:13:52.8830764Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-09-07T06:13:52.8831868Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-09-07T06:13:52.8833397Z * [new branch] gh/eellison/809/base -> origin/gh/eellison/809/base 2025-09-07T06:13:52.8834517Z * [new branch] gh/eellison/809/head -> origin/gh/eellison/809/head 2025-09-07T06:13:52.8835615Z * [new branch] gh/eellison/809/orig -> origin/gh/eellison/809/orig 2025-09-07T06:13:52.8837205Z * [new branch] gh/eellison/813/base -> origin/gh/eellison/813/base 2025-09-07T06:13:52.8838294Z * [new branch] gh/eellison/813/head -> origin/gh/eellison/813/head 2025-09-07T06:13:52.8839390Z * [new branch] gh/eellison/813/orig -> origin/gh/eellison/813/orig 2025-09-07T06:13:52.8840932Z * [new branch] gh/eellison/814/base -> origin/gh/eellison/814/base 2025-09-07T06:13:52.8842082Z * [new branch] gh/eellison/814/head -> origin/gh/eellison/814/head 2025-09-07T06:13:52.8843197Z * [new branch] gh/eellison/814/orig -> origin/gh/eellison/814/orig 2025-09-07T06:13:52.8845527Z * [new branch] gh/eellison/815/base -> origin/gh/eellison/815/base 2025-09-07T06:13:52.8846407Z * [new branch] gh/eellison/815/head -> origin/gh/eellison/815/head 2025-09-07T06:13:52.8847592Z * [new branch] gh/eellison/815/orig -> origin/gh/eellison/815/orig 2025-09-07T06:13:52.8849485Z * [new branch] gh/eellison/816/base -> origin/gh/eellison/816/base 2025-09-07T06:13:52.8850726Z * [new branch] gh/eellison/816/head -> origin/gh/eellison/816/head 2025-09-07T06:13:52.8852102Z * [new branch] gh/eellison/816/orig -> origin/gh/eellison/816/orig 2025-09-07T06:13:52.8853712Z * [new branch] gh/eellison/817/base -> origin/gh/eellison/817/base 2025-09-07T06:13:52.8854849Z * [new branch] gh/eellison/817/head -> origin/gh/eellison/817/head 2025-09-07T06:13:52.8855912Z * [new branch] gh/eellison/817/orig -> origin/gh/eellison/817/orig 2025-09-07T06:13:52.8857551Z * [new branch] gh/eellison/818/base -> origin/gh/eellison/818/base 2025-09-07T06:13:52.8858760Z * [new branch] gh/eellison/818/head -> origin/gh/eellison/818/head 2025-09-07T06:13:52.8859929Z * [new branch] gh/eellison/818/orig -> origin/gh/eellison/818/orig 2025-09-07T06:13:52.8861745Z * [new branch] gh/eellison/819/base -> origin/gh/eellison/819/base 2025-09-07T06:13:52.8862861Z * [new branch] gh/eellison/819/head -> origin/gh/eellison/819/head 2025-09-07T06:13:52.8864118Z * [new branch] gh/eellison/819/orig -> origin/gh/eellison/819/orig 2025-09-07T06:13:52.8866283Z * [new branch] gh/eellison/820/base -> origin/gh/eellison/820/base 2025-09-07T06:13:52.8867540Z * [new branch] gh/eellison/820/head -> origin/gh/eellison/820/head 2025-09-07T06:13:52.8868760Z * [new branch] gh/eellison/820/orig -> origin/gh/eellison/820/orig 2025-09-07T06:13:52.8870229Z * [new branch] gh/eellison/821/base -> origin/gh/eellison/821/base 2025-09-07T06:13:52.8871387Z * [new branch] gh/eellison/821/head -> origin/gh/eellison/821/head 2025-09-07T06:13:52.8872597Z * [new branch] gh/eellison/821/orig -> origin/gh/eellison/821/orig 2025-09-07T06:13:52.8874157Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-09-07T06:13:52.8875286Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-09-07T06:13:52.8876417Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-09-07T06:13:52.8878042Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-09-07T06:13:52.8879181Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-09-07T06:13:52.8880312Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-09-07T06:13:52.8882085Z * [new branch] gh/etaf/132/base -> origin/gh/etaf/132/base 2025-09-07T06:13:52.8883221Z * [new branch] gh/etaf/132/head -> origin/gh/etaf/132/head 2025-09-07T06:13:52.8884440Z * [new branch] gh/etaf/132/orig -> origin/gh/etaf/132/orig 2025-09-07T06:13:52.8885847Z * [new branch] gh/etaf/138/base -> origin/gh/etaf/138/base 2025-09-07T06:13:52.8886957Z * [new branch] gh/etaf/138/head -> origin/gh/etaf/138/head 2025-09-07T06:13:52.8888092Z * [new branch] gh/etaf/138/orig -> origin/gh/etaf/138/orig 2025-09-07T06:13:52.8889676Z * [new branch] gh/etaf/140/base -> origin/gh/etaf/140/base 2025-09-07T06:13:52.8890795Z * [new branch] gh/etaf/140/head -> origin/gh/etaf/140/head 2025-09-07T06:13:52.8892201Z * [new branch] gh/etaf/140/orig -> origin/gh/etaf/140/orig 2025-09-07T06:13:52.8893868Z * [new branch] gh/etaf/143/base -> origin/gh/etaf/143/base 2025-09-07T06:13:52.8895020Z * [new branch] gh/etaf/143/head -> origin/gh/etaf/143/head 2025-09-07T06:13:52.8896170Z * [new branch] gh/etaf/143/orig -> origin/gh/etaf/143/orig 2025-09-07T06:13:52.8897776Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-09-07T06:13:52.8898950Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-09-07T06:13:52.8900645Z * [new branch] gh/etaf/151/base -> origin/gh/etaf/151/base 2025-09-07T06:13:52.8901995Z * [new branch] gh/etaf/151/head -> origin/gh/etaf/151/head 2025-09-07T06:13:52.8903209Z * [new branch] gh/etaf/151/orig -> origin/gh/etaf/151/orig 2025-09-07T06:13:52.8905045Z * [new branch] gh/etaf/152/base -> origin/gh/etaf/152/base 2025-09-07T06:13:52.8906273Z * [new branch] gh/etaf/152/head -> origin/gh/etaf/152/head 2025-09-07T06:13:52.8907430Z * [new branch] gh/etaf/152/orig -> origin/gh/etaf/152/orig 2025-09-07T06:13:52.8909069Z * [new branch] gh/etaf/153/base -> origin/gh/etaf/153/base 2025-09-07T06:13:52.8910290Z * [new branch] gh/etaf/153/head -> origin/gh/etaf/153/head 2025-09-07T06:13:52.8911425Z * [new branch] gh/etaf/153/orig -> origin/gh/etaf/153/orig 2025-09-07T06:13:52.8913164Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-09-07T06:13:52.8914459Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-09-07T06:13:52.8915532Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-09-07T06:13:52.8917129Z * [new branch] gh/etaf/155/base -> origin/gh/etaf/155/base 2025-09-07T06:13:52.8918318Z * [new branch] gh/etaf/155/head -> origin/gh/etaf/155/head 2025-09-07T06:13:52.8919452Z * [new branch] gh/etaf/155/orig -> origin/gh/etaf/155/orig 2025-09-07T06:13:52.8920914Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-09-07T06:13:52.8922041Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-09-07T06:13:52.8923151Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-09-07T06:13:52.8924865Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-09-07T06:13:52.8926053Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-09-07T06:13:52.8927212Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-09-07T06:13:52.8928683Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-09-07T06:13:52.8929853Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-09-07T06:13:52.8930978Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-09-07T06:13:52.8933062Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-09-07T06:13:52.8934178Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-09-07T06:13:52.8935334Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-09-07T06:13:52.8937040Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-09-07T06:13:52.8938274Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-09-07T06:13:52.8939430Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-09-07T06:13:52.8941104Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-09-07T06:13:52.8942305Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-09-07T06:13:52.8943480Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-09-07T06:13:52.8945562Z * [new branch] gh/etaf/162/base -> origin/gh/etaf/162/base 2025-09-07T06:13:52.8946711Z * [new branch] gh/etaf/162/head -> origin/gh/etaf/162/head 2025-09-07T06:13:52.8947824Z * [new branch] gh/etaf/162/orig -> origin/gh/etaf/162/orig 2025-09-07T06:13:52.8949837Z * [new branch] gh/etaf/163/base -> origin/gh/etaf/163/base 2025-09-07T06:13:52.8950998Z * [new branch] gh/etaf/163/head -> origin/gh/etaf/163/head 2025-09-07T06:13:52.8952255Z * [new branch] gh/etaf/163/orig -> origin/gh/etaf/163/orig 2025-09-07T06:13:52.8953972Z * [new branch] gh/etaf/164/base -> origin/gh/etaf/164/base 2025-09-07T06:13:52.8955184Z * [new branch] gh/etaf/164/head -> origin/gh/etaf/164/head 2025-09-07T06:13:52.8956349Z * [new branch] gh/etaf/164/orig -> origin/gh/etaf/164/orig 2025-09-07T06:13:52.8957947Z * [new branch] gh/etaf/165/base -> origin/gh/etaf/165/base 2025-09-07T06:13:52.8959141Z * [new branch] gh/etaf/165/orig -> origin/gh/etaf/165/orig 2025-09-07T06:13:52.8960825Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-09-07T06:13:52.8962118Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-09-07T06:13:52.8963256Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-09-07T06:13:52.8964874Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-09-07T06:13:52.8966093Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-09-07T06:13:52.8967235Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-09-07T06:13:52.8968865Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-09-07T06:13:52.8970086Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-09-07T06:13:52.8971260Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-09-07T06:13:52.8973294Z * [new branch] gh/etaf/169/base -> origin/gh/etaf/169/base 2025-09-07T06:13:52.8974437Z * [new branch] gh/etaf/169/head -> origin/gh/etaf/169/head 2025-09-07T06:13:52.8975588Z * [new branch] gh/etaf/169/orig -> origin/gh/etaf/169/orig 2025-09-07T06:13:52.8977546Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-09-07T06:13:52.8978721Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-09-07T06:13:52.8980354Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-09-07T06:13:52.8981281Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-09-07T06:13:52.8983228Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-09-07T06:13:52.8984310Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-09-07T06:13:52.8985891Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-09-07T06:13:52.8987176Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-09-07T06:13:52.8988981Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-09-07T06:13:52.8990150Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-09-07T06:13:52.8991284Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-09-07T06:13:52.8992808Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-09-07T06:13:52.8993891Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-09-07T06:13:52.8995103Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-09-07T06:13:52.8996587Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-09-07T06:13:52.8997678Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-09-07T06:13:52.8998877Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-09-07T06:13:52.9000385Z * [new branch] gh/ezyang/3074/base -> origin/gh/ezyang/3074/base 2025-09-07T06:13:52.9001520Z * [new branch] gh/ezyang/3074/head -> origin/gh/ezyang/3074/head 2025-09-07T06:13:52.9002648Z * [new branch] gh/ezyang/3074/orig -> origin/gh/ezyang/3074/orig 2025-09-07T06:13:52.9004121Z * [new branch] gh/ezyang/3088/base -> origin/gh/ezyang/3088/base 2025-09-07T06:13:52.9005238Z * [new branch] gh/ezyang/3088/head -> origin/gh/ezyang/3088/head 2025-09-07T06:13:52.9006383Z * [new branch] gh/ezyang/3088/orig -> origin/gh/ezyang/3088/orig 2025-09-07T06:13:52.9007913Z * [new branch] gh/ezyang/3092/base -> origin/gh/ezyang/3092/base 2025-09-07T06:13:52.9009100Z * [new branch] gh/ezyang/3092/head -> origin/gh/ezyang/3092/head 2025-09-07T06:13:52.9010247Z * [new branch] gh/ezyang/3092/orig -> origin/gh/ezyang/3092/orig 2025-09-07T06:13:52.9012012Z * [new branch] gh/ezyang/3103/base -> origin/gh/ezyang/3103/base 2025-09-07T06:13:52.9013224Z * [new branch] gh/ezyang/3103/head -> origin/gh/ezyang/3103/head 2025-09-07T06:13:52.9014386Z * [new branch] gh/ezyang/3103/orig -> origin/gh/ezyang/3103/orig 2025-09-07T06:13:52.9015912Z * [new branch] gh/ezyang/3105/base -> origin/gh/ezyang/3105/base 2025-09-07T06:13:52.9017555Z * [new branch] gh/ezyang/3105/head -> origin/gh/ezyang/3105/head 2025-09-07T06:13:52.9018721Z * [new branch] gh/ezyang/3105/orig -> origin/gh/ezyang/3105/orig 2025-09-07T06:13:52.9020323Z * [new branch] gh/ezyang/3114/base -> origin/gh/ezyang/3114/base 2025-09-07T06:13:52.9021556Z * [new branch] gh/ezyang/3114/head -> origin/gh/ezyang/3114/head 2025-09-07T06:13:52.9022709Z * [new branch] gh/ezyang/3114/orig -> origin/gh/ezyang/3114/orig 2025-09-07T06:13:52.9024348Z * [new branch] gh/ezyang/3116/base -> origin/gh/ezyang/3116/base 2025-09-07T06:13:52.9025507Z * [new branch] gh/ezyang/3116/head -> origin/gh/ezyang/3116/head 2025-09-07T06:13:52.9026624Z * [new branch] gh/ezyang/3116/orig -> origin/gh/ezyang/3116/orig 2025-09-07T06:13:52.9028127Z * [new branch] gh/ezyang/3120/base -> origin/gh/ezyang/3120/base 2025-09-07T06:13:52.9029230Z * [new branch] gh/ezyang/3120/head -> origin/gh/ezyang/3120/head 2025-09-07T06:13:52.9030856Z * [new branch] gh/ezyang/3120/orig -> origin/gh/ezyang/3120/orig 2025-09-07T06:13:52.9031870Z * [new branch] gh/ezyang/3122/base -> origin/gh/ezyang/3122/base 2025-09-07T06:13:52.9032975Z * [new branch] gh/ezyang/3122/head -> origin/gh/ezyang/3122/head 2025-09-07T06:13:52.9034117Z * [new branch] gh/ezyang/3122/orig -> origin/gh/ezyang/3122/orig 2025-09-07T06:13:52.9035592Z * [new branch] gh/ezyang/3123/base -> origin/gh/ezyang/3123/base 2025-09-07T06:13:52.9036671Z * [new branch] gh/ezyang/3123/head -> origin/gh/ezyang/3123/head 2025-09-07T06:13:52.9037816Z * [new branch] gh/ezyang/3123/orig -> origin/gh/ezyang/3123/orig 2025-09-07T06:13:52.9039298Z * [new branch] gh/ezyang/3125/base -> origin/gh/ezyang/3125/base 2025-09-07T06:13:52.9040414Z * [new branch] gh/ezyang/3125/head -> origin/gh/ezyang/3125/head 2025-09-07T06:13:52.9041519Z * [new branch] gh/ezyang/3125/orig -> origin/gh/ezyang/3125/orig 2025-09-07T06:13:52.9043034Z * [new branch] gh/ezyang/3126/base -> origin/gh/ezyang/3126/base 2025-09-07T06:13:52.9044130Z * [new branch] gh/ezyang/3126/head -> origin/gh/ezyang/3126/head 2025-09-07T06:13:52.9045273Z * [new branch] gh/ezyang/3126/orig -> origin/gh/ezyang/3126/orig 2025-09-07T06:13:52.9047214Z * [new branch] gh/ezyang/3127/base -> origin/gh/ezyang/3127/base 2025-09-07T06:13:52.9048312Z * [new branch] gh/ezyang/3127/head -> origin/gh/ezyang/3127/head 2025-09-07T06:13:52.9051072Z * [new branch] gh/ezyang/3127/orig -> origin/gh/ezyang/3127/orig 2025-09-07T06:13:52.9053720Z * [new branch] gh/ezyang/3128/base -> origin/gh/ezyang/3128/base 2025-09-07T06:13:52.9054967Z * [new branch] gh/ezyang/3128/head -> origin/gh/ezyang/3128/head 2025-09-07T06:13:52.9056166Z * [new branch] gh/ezyang/3128/orig -> origin/gh/ezyang/3128/orig 2025-09-07T06:13:52.9057806Z * [new branch] gh/ezyang/3129/base -> origin/gh/ezyang/3129/base 2025-09-07T06:13:52.9058958Z * [new branch] gh/ezyang/3129/head -> origin/gh/ezyang/3129/head 2025-09-07T06:13:52.9060131Z * [new branch] gh/ezyang/3129/orig -> origin/gh/ezyang/3129/orig 2025-09-07T06:13:52.9061749Z * [new branch] gh/ezyang/3130/base -> origin/gh/ezyang/3130/base 2025-09-07T06:13:52.9062910Z * [new branch] gh/ezyang/3130/head -> origin/gh/ezyang/3130/head 2025-09-07T06:13:52.9064244Z * [new branch] gh/ezyang/3130/orig -> origin/gh/ezyang/3130/orig 2025-09-07T06:13:52.9065767Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-09-07T06:13:52.9066918Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-09-07T06:13:52.9068054Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-09-07T06:13:52.9069599Z * [new branch] gh/ezyang/3132/base -> origin/gh/ezyang/3132/base 2025-09-07T06:13:52.9070712Z * [new branch] gh/ezyang/3132/head -> origin/gh/ezyang/3132/head 2025-09-07T06:13:52.9071878Z * [new branch] gh/ezyang/3132/orig -> origin/gh/ezyang/3132/orig 2025-09-07T06:13:52.9073404Z * [new branch] gh/ezyang/3133/base -> origin/gh/ezyang/3133/base 2025-09-07T06:13:52.9074494Z * [new branch] gh/ezyang/3133/head -> origin/gh/ezyang/3133/head 2025-09-07T06:13:52.9075648Z * [new branch] gh/ezyang/3133/orig -> origin/gh/ezyang/3133/orig 2025-09-07T06:13:52.9077222Z * [new branch] gh/ezyang/3134/base -> origin/gh/ezyang/3134/base 2025-09-07T06:13:52.9078454Z * [new branch] gh/ezyang/3134/head -> origin/gh/ezyang/3134/head 2025-09-07T06:13:52.9079459Z * [new branch] gh/ezyang/3134/orig -> origin/gh/ezyang/3134/orig 2025-09-07T06:13:52.9081098Z * [new branch] gh/ezyang/3135/base -> origin/gh/ezyang/3135/base 2025-09-07T06:13:52.9082197Z * [new branch] gh/ezyang/3135/head -> origin/gh/ezyang/3135/head 2025-09-07T06:13:52.9083378Z * [new branch] gh/ezyang/3135/orig -> origin/gh/ezyang/3135/orig 2025-09-07T06:13:52.9084905Z * [new branch] gh/ezyang/3136/base -> origin/gh/ezyang/3136/base 2025-09-07T06:13:52.9086144Z * [new branch] gh/ezyang/3136/head -> origin/gh/ezyang/3136/head 2025-09-07T06:13:52.9087250Z * [new branch] gh/ezyang/3136/orig -> origin/gh/ezyang/3136/orig 2025-09-07T06:13:52.9088793Z * [new branch] gh/ezyang/3137/base -> origin/gh/ezyang/3137/base 2025-09-07T06:13:52.9089965Z * [new branch] gh/ezyang/3137/head -> origin/gh/ezyang/3137/head 2025-09-07T06:13:52.9091029Z * [new branch] gh/ezyang/3137/orig -> origin/gh/ezyang/3137/orig 2025-09-07T06:13:52.9092913Z * [new branch] gh/ezyang/3138/base -> origin/gh/ezyang/3138/base 2025-09-07T06:13:52.9094095Z * [new branch] gh/ezyang/3138/head -> origin/gh/ezyang/3138/head 2025-09-07T06:13:52.9095290Z * [new branch] gh/ezyang/3138/orig -> origin/gh/ezyang/3138/orig 2025-09-07T06:13:52.9096961Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-09-07T06:13:52.9098099Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-09-07T06:13:52.9099275Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-09-07T06:13:52.9101065Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-09-07T06:13:52.9101952Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-09-07T06:13:52.9103173Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-09-07T06:13:52.9104897Z * [new branch] gh/ezyang/3141/base -> origin/gh/ezyang/3141/base 2025-09-07T06:13:52.9106036Z * [new branch] gh/ezyang/3141/head -> origin/gh/ezyang/3141/head 2025-09-07T06:13:52.9107156Z * [new branch] gh/ezyang/3141/orig -> origin/gh/ezyang/3141/orig 2025-09-07T06:13:52.9108707Z * [new branch] gh/ezyang/3142/base -> origin/gh/ezyang/3142/base 2025-09-07T06:13:52.9109808Z * [new branch] gh/ezyang/3142/head -> origin/gh/ezyang/3142/head 2025-09-07T06:13:52.9110943Z * [new branch] gh/ezyang/3142/orig -> origin/gh/ezyang/3142/orig 2025-09-07T06:13:52.9112478Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-09-07T06:13:52.9113581Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-09-07T06:13:52.9114731Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-09-07T06:13:52.9116541Z * [new branch] gh/fadara01/1/base -> origin/gh/fadara01/1/base 2025-09-07T06:13:52.9119208Z * [new branch] gh/fadara01/1/head -> origin/gh/fadara01/1/head 2025-09-07T06:13:52.9120387Z * [new branch] gh/fadara01/1/orig -> origin/gh/fadara01/1/orig 2025-09-07T06:13:52.9122447Z * [new branch] gh/fduwjj/171/base -> origin/gh/fduwjj/171/base 2025-09-07T06:13:52.9123685Z * [new branch] gh/fduwjj/171/head -> origin/gh/fduwjj/171/head 2025-09-07T06:13:52.9124784Z * [new branch] gh/fduwjj/171/orig -> origin/gh/fduwjj/171/orig 2025-09-07T06:13:52.9126641Z * [new branch] gh/fduwjj/175/base -> origin/gh/fduwjj/175/base 2025-09-07T06:13:52.9127888Z * [new branch] gh/fduwjj/175/head -> origin/gh/fduwjj/175/head 2025-09-07T06:13:52.9129010Z * [new branch] gh/fduwjj/175/orig -> origin/gh/fduwjj/175/orig 2025-09-07T06:13:52.9130575Z * [new branch] gh/fduwjj/176/base -> origin/gh/fduwjj/176/base 2025-09-07T06:13:52.9131899Z * [new branch] gh/fduwjj/176/head -> origin/gh/fduwjj/176/head 2025-09-07T06:13:52.9133158Z * [new branch] gh/fduwjj/176/orig -> origin/gh/fduwjj/176/orig 2025-09-07T06:13:52.9134711Z * [new branch] gh/fduwjj/177/base -> origin/gh/fduwjj/177/base 2025-09-07T06:13:52.9135920Z * [new branch] gh/fduwjj/177/head -> origin/gh/fduwjj/177/head 2025-09-07T06:13:52.9137070Z * [new branch] gh/fduwjj/177/orig -> origin/gh/fduwjj/177/orig 2025-09-07T06:13:52.9138680Z * [new branch] gh/fduwjj/178/base -> origin/gh/fduwjj/178/base 2025-09-07T06:13:52.9139922Z * [new branch] gh/fduwjj/178/head -> origin/gh/fduwjj/178/head 2025-09-07T06:13:52.9141079Z * [new branch] gh/fduwjj/178/orig -> origin/gh/fduwjj/178/orig 2025-09-07T06:13:52.9142665Z * [new branch] gh/fduwjj/179/base -> origin/gh/fduwjj/179/base 2025-09-07T06:13:52.9143781Z * [new branch] gh/fduwjj/179/head -> origin/gh/fduwjj/179/head 2025-09-07T06:13:52.9145028Z * [new branch] gh/fduwjj/179/orig -> origin/gh/fduwjj/179/orig 2025-09-07T06:13:52.9146619Z * [new branch] gh/fduwjj/180/base -> origin/gh/fduwjj/180/base 2025-09-07T06:13:52.9147798Z * [new branch] gh/fduwjj/180/head -> origin/gh/fduwjj/180/head 2025-09-07T06:13:52.9149089Z * [new branch] gh/fduwjj/180/orig -> origin/gh/fduwjj/180/orig 2025-09-07T06:13:52.9150983Z * [new branch] gh/fduwjj/181/base -> origin/gh/fduwjj/181/base 2025-09-07T06:13:52.9152148Z * [new branch] gh/fduwjj/181/head -> origin/gh/fduwjj/181/head 2025-09-07T06:13:52.9153276Z * [new branch] gh/fduwjj/181/orig -> origin/gh/fduwjj/181/orig 2025-09-07T06:13:52.9154834Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-09-07T06:13:52.9156091Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-09-07T06:13:52.9157242Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-09-07T06:13:52.9158927Z * [new branch] gh/fduwjj/183/base -> origin/gh/fduwjj/183/base 2025-09-07T06:13:52.9160305Z * [new branch] gh/fduwjj/183/head -> origin/gh/fduwjj/183/head 2025-09-07T06:13:52.9161673Z * [new branch] gh/fduwjj/183/orig -> origin/gh/fduwjj/183/orig 2025-09-07T06:13:52.9163483Z * [new branch] gh/fduwjj/184/base -> origin/gh/fduwjj/184/base 2025-09-07T06:13:52.9164569Z * [new branch] gh/fduwjj/184/head -> origin/gh/fduwjj/184/head 2025-09-07T06:13:52.9165646Z * [new branch] gh/fduwjj/184/orig -> origin/gh/fduwjj/184/orig 2025-09-07T06:13:52.9167276Z * [new branch] gh/fduwjj/185/base -> origin/gh/fduwjj/185/base 2025-09-07T06:13:52.9168396Z * [new branch] gh/fduwjj/185/head -> origin/gh/fduwjj/185/head 2025-09-07T06:13:52.9169552Z * [new branch] gh/fduwjj/185/orig -> origin/gh/fduwjj/185/orig 2025-09-07T06:13:52.9170925Z * [new branch] gh/fduwjj/186/base -> origin/gh/fduwjj/186/base 2025-09-07T06:13:52.9172367Z * [new branch] gh/fduwjj/186/head -> origin/gh/fduwjj/186/head 2025-09-07T06:13:52.9173571Z * [new branch] gh/fduwjj/186/orig -> origin/gh/fduwjj/186/orig 2025-09-07T06:13:52.9175125Z * [new branch] gh/fduwjj/187/base -> origin/gh/fduwjj/187/base 2025-09-07T06:13:52.9176147Z * [new branch] gh/fduwjj/187/head -> origin/gh/fduwjj/187/head 2025-09-07T06:13:52.9177297Z * [new branch] gh/fduwjj/187/orig -> origin/gh/fduwjj/187/orig 2025-09-07T06:13:52.9178717Z * [new branch] gh/fduwjj/188/base -> origin/gh/fduwjj/188/base 2025-09-07T06:13:52.9179867Z * [new branch] gh/fduwjj/188/head -> origin/gh/fduwjj/188/head 2025-09-07T06:13:52.9180951Z * [new branch] gh/fduwjj/188/orig -> origin/gh/fduwjj/188/orig 2025-09-07T06:13:52.9182343Z * [new branch] gh/fduwjj/189/base -> origin/gh/fduwjj/189/base 2025-09-07T06:13:52.9183503Z * [new branch] gh/fduwjj/189/head -> origin/gh/fduwjj/189/head 2025-09-07T06:13:52.9184630Z * [new branch] gh/fduwjj/189/orig -> origin/gh/fduwjj/189/orig 2025-09-07T06:13:52.9186540Z * [new branch] gh/fduwjj/190/base -> origin/gh/fduwjj/190/base 2025-09-07T06:13:52.9187718Z * [new branch] gh/fduwjj/190/head -> origin/gh/fduwjj/190/head 2025-09-07T06:13:52.9188920Z * [new branch] gh/fduwjj/190/orig -> origin/gh/fduwjj/190/orig 2025-09-07T06:13:52.9190453Z * [new branch] gh/fduwjj/191/base -> origin/gh/fduwjj/191/base 2025-09-07T06:13:52.9191683Z * [new branch] gh/fduwjj/191/head -> origin/gh/fduwjj/191/head 2025-09-07T06:13:52.9192821Z * [new branch] gh/fduwjj/191/orig -> origin/gh/fduwjj/191/orig 2025-09-07T06:13:52.9194722Z * [new branch] gh/fegin/306/base -> origin/gh/fegin/306/base 2025-09-07T06:13:52.9195727Z * [new branch] gh/fegin/306/head -> origin/gh/fegin/306/head 2025-09-07T06:13:52.9196976Z * [new branch] gh/fegin/306/orig -> origin/gh/fegin/306/orig 2025-09-07T06:13:52.9198366Z * [new branch] gh/fegin/307/base -> origin/gh/fegin/307/base 2025-09-07T06:13:52.9199465Z * [new branch] gh/fegin/307/head -> origin/gh/fegin/307/head 2025-09-07T06:13:52.9200673Z * [new branch] gh/fegin/307/orig -> origin/gh/fegin/307/orig 2025-09-07T06:13:52.9202207Z * [new branch] gh/fegin/308/base -> origin/gh/fegin/308/base 2025-09-07T06:13:52.9203312Z * [new branch] gh/fegin/308/head -> origin/gh/fegin/308/head 2025-09-07T06:13:52.9204445Z * [new branch] gh/fegin/308/orig -> origin/gh/fegin/308/orig 2025-09-07T06:13:52.9205992Z * [new branch] gh/fegin/309/base -> origin/gh/fegin/309/base 2025-09-07T06:13:52.9207102Z * [new branch] gh/fegin/309/head -> origin/gh/fegin/309/head 2025-09-07T06:13:52.9208279Z * [new branch] gh/fegin/309/orig -> origin/gh/fegin/309/orig 2025-09-07T06:13:52.9209807Z * [new branch] gh/fegin/310/base -> origin/gh/fegin/310/base 2025-09-07T06:13:52.9210914Z * [new branch] gh/fegin/310/head -> origin/gh/fegin/310/head 2025-09-07T06:13:52.9212464Z * [new branch] gh/fegin/310/orig -> origin/gh/fegin/310/orig 2025-09-07T06:13:52.9214039Z * [new branch] gh/fegin/311/base -> origin/gh/fegin/311/base 2025-09-07T06:13:52.9215226Z * [new branch] gh/fegin/311/head -> origin/gh/fegin/311/head 2025-09-07T06:13:52.9216458Z * [new branch] gh/fegin/311/orig -> origin/gh/fegin/311/orig 2025-09-07T06:13:52.9217968Z * [new branch] gh/fegin/312/base -> origin/gh/fegin/312/base 2025-09-07T06:13:52.9219089Z * [new branch] gh/fegin/312/head -> origin/gh/fegin/312/head 2025-09-07T06:13:52.9220216Z * [new branch] gh/fegin/312/orig -> origin/gh/fegin/312/orig 2025-09-07T06:13:52.9221892Z * [new branch] gh/fegin/313/base -> origin/gh/fegin/313/base 2025-09-07T06:13:52.9223075Z * [new branch] gh/fegin/313/head -> origin/gh/fegin/313/head 2025-09-07T06:13:52.9224383Z * [new branch] gh/fegin/313/orig -> origin/gh/fegin/313/orig 2025-09-07T06:13:52.9226175Z * [new branch] gh/fffrog/124/base -> origin/gh/fffrog/124/base 2025-09-07T06:13:52.9227309Z * [new branch] gh/fffrog/124/head -> origin/gh/fffrog/124/head 2025-09-07T06:13:52.9228562Z * [new branch] gh/fffrog/124/orig -> origin/gh/fffrog/124/orig 2025-09-07T06:13:52.9230121Z * [new branch] gh/fffrog/129/base -> origin/gh/fffrog/129/base 2025-09-07T06:13:52.9231246Z * [new branch] gh/fffrog/129/head -> origin/gh/fffrog/129/head 2025-09-07T06:13:52.9232388Z * [new branch] gh/fffrog/129/orig -> origin/gh/fffrog/129/orig 2025-09-07T06:13:52.9233878Z * [new branch] gh/fffrog/130/base -> origin/gh/fffrog/130/base 2025-09-07T06:13:52.9234986Z * [new branch] gh/fffrog/130/head -> origin/gh/fffrog/130/head 2025-09-07T06:13:52.9236189Z * [new branch] gh/fffrog/130/orig -> origin/gh/fffrog/130/orig 2025-09-07T06:13:52.9237693Z * [new branch] gh/fffrog/131/base -> origin/gh/fffrog/131/base 2025-09-07T06:13:52.9238797Z * [new branch] gh/fffrog/131/head -> origin/gh/fffrog/131/head 2025-09-07T06:13:52.9239967Z * [new branch] gh/fffrog/131/orig -> origin/gh/fffrog/131/orig 2025-09-07T06:13:52.9241482Z * [new branch] gh/fffrog/132/base -> origin/gh/fffrog/132/base 2025-09-07T06:13:52.9242645Z * [new branch] gh/fffrog/132/head -> origin/gh/fffrog/132/head 2025-09-07T06:13:52.9243790Z * [new branch] gh/fffrog/132/orig -> origin/gh/fffrog/132/orig 2025-09-07T06:13:52.9245304Z * [new branch] gh/fffrog/133/base -> origin/gh/fffrog/133/base 2025-09-07T06:13:52.9246428Z * [new branch] gh/fffrog/133/head -> origin/gh/fffrog/133/head 2025-09-07T06:13:52.9247535Z * [new branch] gh/fffrog/133/orig -> origin/gh/fffrog/133/orig 2025-09-07T06:13:52.9249184Z * [new branch] gh/fffrog/134/base -> origin/gh/fffrog/134/base 2025-09-07T06:13:52.9250641Z * [new branch] gh/fffrog/134/head -> origin/gh/fffrog/134/head 2025-09-07T06:13:52.9251924Z * [new branch] gh/fffrog/134/orig -> origin/gh/fffrog/134/orig 2025-09-07T06:13:52.9253592Z * [new branch] gh/fffrog/135/base -> origin/gh/fffrog/135/base 2025-09-07T06:13:52.9254820Z * [new branch] gh/fffrog/135/head -> origin/gh/fffrog/135/head 2025-09-07T06:13:52.9256016Z * [new branch] gh/fffrog/135/orig -> origin/gh/fffrog/135/orig 2025-09-07T06:13:52.9257568Z * [new branch] gh/fffrog/136/base -> origin/gh/fffrog/136/base 2025-09-07T06:13:52.9258676Z * [new branch] gh/fffrog/136/head -> origin/gh/fffrog/136/head 2025-09-07T06:13:52.9259799Z * [new branch] gh/fffrog/136/orig -> origin/gh/fffrog/136/orig 2025-09-07T06:13:52.9263667Z * [new branch] gh/fffrog/137/base -> origin/gh/fffrog/137/base 2025-09-07T06:13:52.9263905Z * [new branch] gh/fffrog/137/head -> origin/gh/fffrog/137/head 2025-09-07T06:13:52.9264130Z * [new branch] gh/fffrog/137/orig -> origin/gh/fffrog/137/orig 2025-09-07T06:13:52.9265176Z * [new branch] gh/fffrog/138/base -> origin/gh/fffrog/138/base 2025-09-07T06:13:52.9266283Z * [new branch] gh/fffrog/138/head -> origin/gh/fffrog/138/head 2025-09-07T06:13:52.9267633Z * [new branch] gh/fffrog/138/orig -> origin/gh/fffrog/138/orig 2025-09-07T06:13:52.9269033Z * [new branch] gh/fffrog/139/base -> origin/gh/fffrog/139/base 2025-09-07T06:13:52.9270319Z * [new branch] gh/fffrog/139/head -> origin/gh/fffrog/139/head 2025-09-07T06:13:52.9271318Z * [new branch] gh/fffrog/139/orig -> origin/gh/fffrog/139/orig 2025-09-07T06:13:52.9272797Z * [new branch] gh/fffrog/140/base -> origin/gh/fffrog/140/base 2025-09-07T06:13:52.9273936Z * [new branch] gh/fffrog/140/head -> origin/gh/fffrog/140/head 2025-09-07T06:13:52.9274971Z * [new branch] gh/fffrog/140/orig -> origin/gh/fffrog/140/orig 2025-09-07T06:13:52.9276466Z * [new branch] gh/fffrog/141/base -> origin/gh/fffrog/141/base 2025-09-07T06:13:52.9277544Z * [new branch] gh/fffrog/141/head -> origin/gh/fffrog/141/head 2025-09-07T06:13:52.9278642Z * [new branch] gh/fffrog/141/orig -> origin/gh/fffrog/141/orig 2025-09-07T06:13:52.9280667Z * [new branch] gh/fffrog/142/base -> origin/gh/fffrog/142/base 2025-09-07T06:13:52.9281781Z * [new branch] gh/fffrog/142/head -> origin/gh/fffrog/142/head 2025-09-07T06:13:52.9282940Z * [new branch] gh/fffrog/142/orig -> origin/gh/fffrog/142/orig 2025-09-07T06:13:52.9284437Z * [new branch] gh/fffrog/143/base -> origin/gh/fffrog/143/base 2025-09-07T06:13:52.9285600Z * [new branch] gh/fffrog/143/head -> origin/gh/fffrog/143/head 2025-09-07T06:13:52.9286729Z * [new branch] gh/fffrog/143/orig -> origin/gh/fffrog/143/orig 2025-09-07T06:13:52.9288633Z * [new branch] gh/fffrog/144/base -> origin/gh/fffrog/144/base 2025-09-07T06:13:52.9290019Z * [new branch] gh/fffrog/144/head -> origin/gh/fffrog/144/head 2025-09-07T06:13:52.9291174Z * [new branch] gh/fffrog/144/orig -> origin/gh/fffrog/144/orig 2025-09-07T06:13:52.9293149Z * [new branch] gh/fffrog/145/base -> origin/gh/fffrog/145/base 2025-09-07T06:13:52.9294293Z * [new branch] gh/fffrog/145/head -> origin/gh/fffrog/145/head 2025-09-07T06:13:52.9295506Z * [new branch] gh/fffrog/145/orig -> origin/gh/fffrog/145/orig 2025-09-07T06:13:52.9297005Z * [new branch] gh/fffrog/146/base -> origin/gh/fffrog/146/base 2025-09-07T06:13:52.9298152Z * [new branch] gh/fffrog/146/head -> origin/gh/fffrog/146/head 2025-09-07T06:13:52.9299314Z * [new branch] gh/fffrog/146/orig -> origin/gh/fffrog/146/orig 2025-09-07T06:13:52.9300864Z * [new branch] gh/fffrog/147/base -> origin/gh/fffrog/147/base 2025-09-07T06:13:52.9302030Z * [new branch] gh/fffrog/147/head -> origin/gh/fffrog/147/head 2025-09-07T06:13:52.9303219Z * [new branch] gh/fffrog/147/orig -> origin/gh/fffrog/147/orig 2025-09-07T06:13:52.9304910Z * [new branch] gh/fffrog/148/base -> origin/gh/fffrog/148/base 2025-09-07T06:13:52.9306139Z * [new branch] gh/fffrog/148/head -> origin/gh/fffrog/148/head 2025-09-07T06:13:52.9307487Z * [new branch] gh/fffrog/148/orig -> origin/gh/fffrog/148/orig 2025-09-07T06:13:52.9308935Z * [new branch] gh/fffrog/149/base -> origin/gh/fffrog/149/base 2025-09-07T06:13:52.9310003Z * [new branch] gh/fffrog/149/head -> origin/gh/fffrog/149/head 2025-09-07T06:13:52.9311258Z * [new branch] gh/fffrog/149/orig -> origin/gh/fffrog/149/orig 2025-09-07T06:13:52.9312809Z * [new branch] gh/fffrog/150/base -> origin/gh/fffrog/150/base 2025-09-07T06:13:52.9313881Z * [new branch] gh/fffrog/150/head -> origin/gh/fffrog/150/head 2025-09-07T06:13:52.9315623Z * [new branch] gh/fffrog/150/orig -> origin/gh/fffrog/150/orig 2025-09-07T06:13:52.9316580Z * [new branch] gh/fffrog/151/base -> origin/gh/fffrog/151/base 2025-09-07T06:13:52.9317732Z * [new branch] gh/fffrog/151/head -> origin/gh/fffrog/151/head 2025-09-07T06:13:52.9318883Z * [new branch] gh/fffrog/151/orig -> origin/gh/fffrog/151/orig 2025-09-07T06:13:52.9320369Z * [new branch] gh/fffrog/152/base -> origin/gh/fffrog/152/base 2025-09-07T06:13:52.9321530Z * [new branch] gh/fffrog/152/head -> origin/gh/fffrog/152/head 2025-09-07T06:13:52.9323099Z * [new branch] gh/fffrog/153/base -> origin/gh/fffrog/153/base 2025-09-07T06:13:52.9324124Z * [new branch] gh/fffrog/153/head -> origin/gh/fffrog/153/head 2025-09-07T06:13:52.9325233Z * [new branch] gh/fffrog/153/orig -> origin/gh/fffrog/153/orig 2025-09-07T06:13:52.9327118Z * [new branch] gh/gmagogsfm/1/base -> origin/gh/gmagogsfm/1/base 2025-09-07T06:13:52.9328040Z * [new branch] gh/gmagogsfm/1/head -> origin/gh/gmagogsfm/1/head 2025-09-07T06:13:52.9329366Z * [new branch] gh/gmagogsfm/1/orig -> origin/gh/gmagogsfm/1/orig 2025-09-07T06:13:52.9330899Z * [new branch] gh/gmagogsfm/2/base -> origin/gh/gmagogsfm/2/base 2025-09-07T06:13:52.9332068Z * [new branch] gh/gmagogsfm/2/head -> origin/gh/gmagogsfm/2/head 2025-09-07T06:13:52.9333352Z * [new branch] gh/gmagogsfm/2/orig -> origin/gh/gmagogsfm/2/orig 2025-09-07T06:13:52.9334804Z * [new branch] gh/gmagogsfm/3/base -> origin/gh/gmagogsfm/3/base 2025-09-07T06:13:52.9335948Z * [new branch] gh/gmagogsfm/3/head -> origin/gh/gmagogsfm/3/head 2025-09-07T06:13:52.9337275Z * [new branch] gh/gmagogsfm/3/orig -> origin/gh/gmagogsfm/3/orig 2025-09-07T06:13:52.9339035Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-09-07T06:13:52.9340215Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-09-07T06:13:52.9341411Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-09-07T06:13:52.9342969Z * [new branch] gh/guangyey/135/base -> origin/gh/guangyey/135/base 2025-09-07T06:13:52.9344258Z * [new branch] gh/guangyey/135/head -> origin/gh/guangyey/135/head 2025-09-07T06:13:52.9345537Z * [new branch] gh/guangyey/135/orig -> origin/gh/guangyey/135/orig 2025-09-07T06:13:52.9346939Z * [new branch] gh/guangyey/139/base -> origin/gh/guangyey/139/base 2025-09-07T06:13:52.9348076Z * [new branch] gh/guangyey/139/head -> origin/gh/guangyey/139/head 2025-09-07T06:13:52.9349365Z * [new branch] gh/guangyey/139/orig -> origin/gh/guangyey/139/orig 2025-09-07T06:13:52.9351166Z * [new branch] gh/guangyey/140/base -> origin/gh/guangyey/140/base 2025-09-07T06:13:52.9352464Z * [new branch] gh/guangyey/140/head -> origin/gh/guangyey/140/head 2025-09-07T06:13:52.9353391Z * [new branch] gh/guangyey/140/orig -> origin/gh/guangyey/140/orig 2025-09-07T06:13:52.9354997Z * [new branch] gh/guangyey/142/base -> origin/gh/guangyey/142/base 2025-09-07T06:13:52.9356181Z * [new branch] gh/guangyey/142/head -> origin/gh/guangyey/142/head 2025-09-07T06:13:52.9357509Z * [new branch] gh/guangyey/142/orig -> origin/gh/guangyey/142/orig 2025-09-07T06:13:52.9359011Z * [new branch] gh/guangyey/145/base -> origin/gh/guangyey/145/base 2025-09-07T06:13:52.9360349Z * [new branch] gh/guangyey/145/head -> origin/gh/guangyey/145/head 2025-09-07T06:13:52.9361372Z * [new branch] gh/guangyey/145/orig -> origin/gh/guangyey/145/orig 2025-09-07T06:13:52.9362960Z * [new branch] gh/guangyey/153/base -> origin/gh/guangyey/153/base 2025-09-07T06:13:52.9364126Z * [new branch] gh/guangyey/153/head -> origin/gh/guangyey/153/head 2025-09-07T06:13:52.9365217Z * [new branch] gh/guangyey/153/orig -> origin/gh/guangyey/153/orig 2025-09-07T06:13:52.9366787Z * [new branch] gh/guangyey/159/base -> origin/gh/guangyey/159/base 2025-09-07T06:13:52.9367875Z * [new branch] gh/guangyey/159/head -> origin/gh/guangyey/159/head 2025-09-07T06:13:52.9369020Z * [new branch] gh/guangyey/159/orig -> origin/gh/guangyey/159/orig 2025-09-07T06:13:52.9370583Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-09-07T06:13:52.9371735Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-09-07T06:13:52.9373185Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-09-07T06:13:52.9374745Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-09-07T06:13:52.9375801Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-09-07T06:13:52.9377025Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-09-07T06:13:52.9378603Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-09-07T06:13:52.9379735Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-09-07T06:13:52.9380879Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-09-07T06:13:52.9382534Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-09-07T06:13:52.9383719Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-09-07T06:13:52.9385061Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-09-07T06:13:52.9386535Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-09-07T06:13:52.9387615Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-09-07T06:13:52.9388800Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-09-07T06:13:52.9390440Z * [new branch] gh/guangyey/174/base -> origin/gh/guangyey/174/base 2025-09-07T06:13:52.9391288Z * [new branch] gh/guangyey/174/head -> origin/gh/guangyey/174/head 2025-09-07T06:13:52.9392510Z * [new branch] gh/guangyey/174/orig -> origin/gh/guangyey/174/orig 2025-09-07T06:13:52.9393998Z * [new branch] gh/guangyey/176/base -> origin/gh/guangyey/176/base 2025-09-07T06:13:52.9395110Z * [new branch] gh/guangyey/176/head -> origin/gh/guangyey/176/head 2025-09-07T06:13:52.9396362Z * [new branch] gh/guangyey/176/orig -> origin/gh/guangyey/176/orig 2025-09-07T06:13:52.9397787Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-09-07T06:13:52.9398901Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-09-07T06:13:52.9400001Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-09-07T06:13:52.9401915Z * [new branch] gh/guangyey/181/base -> origin/gh/guangyey/181/base 2025-09-07T06:13:52.9403069Z * [new branch] gh/guangyey/181/head -> origin/gh/guangyey/181/head 2025-09-07T06:13:52.9404200Z * [new branch] gh/guangyey/181/orig -> origin/gh/guangyey/181/orig 2025-09-07T06:13:52.9405754Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-09-07T06:13:52.9406982Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-09-07T06:13:52.9407901Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-09-07T06:13:52.9409479Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-09-07T06:13:52.9410680Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-09-07T06:13:52.9412179Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-09-07T06:13:52.9413767Z * [new branch] gh/guangyey/184/base -> origin/gh/guangyey/184/base 2025-09-07T06:13:52.9414910Z * [new branch] gh/guangyey/184/head -> origin/gh/guangyey/184/head 2025-09-07T06:13:52.9416099Z * [new branch] gh/guangyey/184/orig -> origin/gh/guangyey/184/orig 2025-09-07T06:13:52.9417810Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-09-07T06:13:52.9418848Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-09-07T06:13:52.9420036Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-09-07T06:13:52.9421725Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-09-07T06:13:52.9422869Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-09-07T06:13:52.9424193Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-09-07T06:13:52.9425810Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-09-07T06:13:52.9426874Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-09-07T06:13:52.9428059Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-09-07T06:13:52.9429634Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-09-07T06:13:52.9430786Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-09-07T06:13:52.9431921Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-09-07T06:13:52.9433413Z * [new branch] gh/guangyey/189/base -> origin/gh/guangyey/189/base 2025-09-07T06:13:52.9434582Z * [new branch] gh/guangyey/189/head -> origin/gh/guangyey/189/head 2025-09-07T06:13:52.9435719Z * [new branch] gh/guangyey/189/orig -> origin/gh/guangyey/189/orig 2025-09-07T06:13:52.9437182Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-09-07T06:13:52.9438280Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-09-07T06:13:52.9439478Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-09-07T06:13:52.9440969Z * [new branch] gh/guangyey/191/base -> origin/gh/guangyey/191/base 2025-09-07T06:13:52.9442151Z * [new branch] gh/guangyey/191/head -> origin/gh/guangyey/191/head 2025-09-07T06:13:52.9443268Z * [new branch] gh/guangyey/191/orig -> origin/gh/guangyey/191/orig 2025-09-07T06:13:52.9444772Z * [new branch] gh/guangyey/192/base -> origin/gh/guangyey/192/base 2025-09-07T06:13:52.9445887Z * [new branch] gh/guangyey/192/head -> origin/gh/guangyey/192/head 2025-09-07T06:13:52.9447073Z * [new branch] gh/guangyey/192/orig -> origin/gh/guangyey/192/orig 2025-09-07T06:13:52.9448563Z * [new branch] gh/guangyey/193/base -> origin/gh/guangyey/193/base 2025-09-07T06:13:52.9450400Z * [new branch] gh/guangyey/193/head -> origin/gh/guangyey/193/head 2025-09-07T06:13:52.9451404Z * [new branch] gh/guangyey/193/orig -> origin/gh/guangyey/193/orig 2025-09-07T06:13:52.9453292Z * [new branch] gh/guangyey/194/base -> origin/gh/guangyey/194/base 2025-09-07T06:13:52.9454314Z * [new branch] gh/guangyey/194/head -> origin/gh/guangyey/194/head 2025-09-07T06:13:52.9455523Z * [new branch] gh/guangyey/194/orig -> origin/gh/guangyey/194/orig 2025-09-07T06:13:52.9457098Z * [new branch] gh/guangyey/195/base -> origin/gh/guangyey/195/base 2025-09-07T06:13:52.9458540Z * [new branch] gh/guangyey/195/head -> origin/gh/guangyey/195/head 2025-09-07T06:13:52.9459502Z * [new branch] gh/guangyey/195/orig -> origin/gh/guangyey/195/orig 2025-09-07T06:13:52.9461206Z * [new branch] gh/guangyey/196/base -> origin/gh/guangyey/196/base 2025-09-07T06:13:52.9462360Z * [new branch] gh/guangyey/196/head -> origin/gh/guangyey/196/head 2025-09-07T06:13:52.9463621Z * [new branch] gh/guangyey/196/orig -> origin/gh/guangyey/196/orig 2025-09-07T06:13:52.9465321Z * [new branch] gh/guangyey/197/base -> origin/gh/guangyey/197/base 2025-09-07T06:13:52.9466359Z * [new branch] gh/guangyey/197/head -> origin/gh/guangyey/197/head 2025-09-07T06:13:52.9467535Z * [new branch] gh/guangyey/197/orig -> origin/gh/guangyey/197/orig 2025-09-07T06:13:52.9469291Z * [new branch] gh/guangyey/198/base -> origin/gh/guangyey/198/base 2025-09-07T06:13:52.9470338Z * [new branch] gh/guangyey/198/head -> origin/gh/guangyey/198/head 2025-09-07T06:13:52.9471489Z * [new branch] gh/guangyey/198/orig -> origin/gh/guangyey/198/orig 2025-09-07T06:13:52.9473209Z * [new branch] gh/guangyey/199/base -> origin/gh/guangyey/199/base 2025-09-07T06:13:52.9474147Z * [new branch] gh/guangyey/199/head -> origin/gh/guangyey/199/head 2025-09-07T06:13:52.9475477Z * [new branch] gh/guangyey/199/orig -> origin/gh/guangyey/199/orig 2025-09-07T06:13:52.9477024Z * [new branch] gh/guangyey/200/base -> origin/gh/guangyey/200/base 2025-09-07T06:13:52.9477962Z * [new branch] gh/guangyey/200/head -> origin/gh/guangyey/200/head 2025-09-07T06:13:52.9479212Z * [new branch] gh/guangyey/200/orig -> origin/gh/guangyey/200/orig 2025-09-07T06:13:52.9480762Z * [new branch] gh/guangyey/201/base -> origin/gh/guangyey/201/base 2025-09-07T06:13:52.9481929Z * [new branch] gh/guangyey/201/head -> origin/gh/guangyey/201/head 2025-09-07T06:13:52.9483102Z * [new branch] gh/guangyey/201/orig -> origin/gh/guangyey/201/orig 2025-09-07T06:13:52.9484615Z * [new branch] gh/guangyey/202/base -> origin/gh/guangyey/202/base 2025-09-07T06:13:52.9485708Z * [new branch] gh/guangyey/202/head -> origin/gh/guangyey/202/head 2025-09-07T06:13:52.9486904Z * [new branch] gh/guangyey/202/orig -> origin/gh/guangyey/202/orig 2025-09-07T06:13:52.9488628Z * [new branch] gh/guangyey/203/base -> origin/gh/guangyey/203/base 2025-09-07T06:13:52.9489570Z * [new branch] gh/guangyey/203/head -> origin/gh/guangyey/203/head 2025-09-07T06:13:52.9490817Z * [new branch] gh/guangyey/203/orig -> origin/gh/guangyey/203/orig 2025-09-07T06:13:52.9492648Z * [new branch] gh/guangyey/204/base -> origin/gh/guangyey/204/base 2025-09-07T06:13:52.9493856Z * [new branch] gh/guangyey/204/head -> origin/gh/guangyey/204/head 2025-09-07T06:13:52.9495063Z * [new branch] gh/guangyey/204/orig -> origin/gh/guangyey/204/orig 2025-09-07T06:13:52.9496615Z * [new branch] gh/guangyey/205/base -> origin/gh/guangyey/205/base 2025-09-07T06:13:52.9497779Z * [new branch] gh/guangyey/205/head -> origin/gh/guangyey/205/head 2025-09-07T06:13:52.9498980Z * [new branch] gh/guangyey/205/orig -> origin/gh/guangyey/205/orig 2025-09-07T06:13:52.9500529Z * [new branch] gh/guangyey/206/base -> origin/gh/guangyey/206/base 2025-09-07T06:13:52.9501693Z * [new branch] gh/guangyey/206/head -> origin/gh/guangyey/206/head 2025-09-07T06:13:52.9502853Z * [new branch] gh/guangyey/206/orig -> origin/gh/guangyey/206/orig 2025-09-07T06:13:52.9504581Z * [new branch] gh/guangyey/207/base -> origin/gh/guangyey/207/base 2025-09-07T06:13:52.9505791Z * [new branch] gh/guangyey/207/head -> origin/gh/guangyey/207/head 2025-09-07T06:13:52.9506793Z * [new branch] gh/guangyey/207/orig -> origin/gh/guangyey/207/orig 2025-09-07T06:13:52.9508456Z * [new branch] gh/guangyey/79/base -> origin/gh/guangyey/79/base 2025-09-07T06:13:52.9509579Z * [new branch] gh/guangyey/79/head -> origin/gh/guangyey/79/head 2025-09-07T06:13:52.9510626Z * [new branch] gh/guangyey/79/orig -> origin/gh/guangyey/79/orig 2025-09-07T06:13:52.9512136Z * [new branch] gh/guangyey/89/base -> origin/gh/guangyey/89/base 2025-09-07T06:13:52.9513175Z * [new branch] gh/guangyey/89/head -> origin/gh/guangyey/89/head 2025-09-07T06:13:52.9514321Z * [new branch] gh/guangyey/89/orig -> origin/gh/guangyey/89/orig 2025-09-07T06:13:52.9516728Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-09-07T06:13:52.9517877Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-09-07T06:13:52.9519026Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-09-07T06:13:52.9520479Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-09-07T06:13:52.9521586Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-09-07T06:13:52.9522701Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-09-07T06:13:52.9524251Z * [new branch] gh/guilhermeleobas/124/base -> origin/gh/guilhermeleobas/124/base 2025-09-07T06:13:52.9525518Z * [new branch] gh/guilhermeleobas/124/head -> origin/gh/guilhermeleobas/124/head 2025-09-07T06:13:52.9526538Z * [new branch] gh/guilhermeleobas/124/orig -> origin/gh/guilhermeleobas/124/orig 2025-09-07T06:13:52.9528024Z * [new branch] gh/guilhermeleobas/147/base -> origin/gh/guilhermeleobas/147/base 2025-09-07T06:13:52.9529198Z * [new branch] gh/guilhermeleobas/147/head -> origin/gh/guilhermeleobas/147/head 2025-09-07T06:13:52.9530372Z * [new branch] gh/guilhermeleobas/147/orig -> origin/gh/guilhermeleobas/147/orig 2025-09-07T06:13:52.9532074Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-09-07T06:13:52.9544452Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-09-07T06:13:52.9544892Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-09-07T06:13:52.9545231Z * [new branch] gh/guilhermeleobas/163/base -> origin/gh/guilhermeleobas/163/base 2025-09-07T06:13:52.9545520Z * [new branch] gh/guilhermeleobas/163/head -> origin/gh/guilhermeleobas/163/head 2025-09-07T06:13:52.9545809Z * [new branch] gh/guilhermeleobas/163/orig -> origin/gh/guilhermeleobas/163/orig 2025-09-07T06:13:52.9546098Z * [new branch] gh/guilhermeleobas/164/base -> origin/gh/guilhermeleobas/164/base 2025-09-07T06:13:52.9546397Z * [new branch] gh/guilhermeleobas/164/head -> origin/gh/guilhermeleobas/164/head 2025-09-07T06:13:52.9546683Z * [new branch] gh/guilhermeleobas/164/orig -> origin/gh/guilhermeleobas/164/orig 2025-09-07T06:13:52.9547074Z * [new branch] gh/guilhermeleobas/165/base -> origin/gh/guilhermeleobas/165/base 2025-09-07T06:13:52.9547376Z * [new branch] gh/guilhermeleobas/165/head -> origin/gh/guilhermeleobas/165/head 2025-09-07T06:13:52.9547673Z * [new branch] gh/guilhermeleobas/165/orig -> origin/gh/guilhermeleobas/165/orig 2025-09-07T06:13:52.9547962Z * [new branch] gh/guilhermeleobas/166/base -> origin/gh/guilhermeleobas/166/base 2025-09-07T06:13:52.9549376Z * [new branch] gh/guilhermeleobas/166/head -> origin/gh/guilhermeleobas/166/head 2025-09-07T06:13:52.9550991Z * [new branch] gh/guilhermeleobas/166/orig -> origin/gh/guilhermeleobas/166/orig 2025-09-07T06:13:52.9552812Z * [new branch] gh/guilhermeleobas/167/base -> origin/gh/guilhermeleobas/167/base 2025-09-07T06:13:52.9553643Z * [new branch] gh/guilhermeleobas/167/head -> origin/gh/guilhermeleobas/167/head 2025-09-07T06:13:52.9555488Z * [new branch] gh/guilhermeleobas/167/orig -> origin/gh/guilhermeleobas/167/orig 2025-09-07T06:13:52.9557038Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-09-07T06:13:52.9558411Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-09-07T06:13:52.9559600Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-09-07T06:13:52.9561290Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-09-07T06:13:52.9562400Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-09-07T06:13:52.9563512Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-09-07T06:13:52.9564999Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-09-07T06:13:52.9566101Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-09-07T06:13:52.9567242Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-09-07T06:13:52.9568708Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-09-07T06:13:52.9569803Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-09-07T06:13:52.9570937Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-09-07T06:13:52.9572877Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-09-07T06:13:52.9573949Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-09-07T06:13:52.9575087Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-09-07T06:13:52.9576608Z * [new branch] gh/guilhermeleobas/192/base -> origin/gh/guilhermeleobas/192/base 2025-09-07T06:13:52.9577812Z * [new branch] gh/guilhermeleobas/192/head -> origin/gh/guilhermeleobas/192/head 2025-09-07T06:13:52.9579033Z * [new branch] gh/guilhermeleobas/192/orig -> origin/gh/guilhermeleobas/192/orig 2025-09-07T06:13:52.9580987Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-09-07T06:13:52.9582150Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-09-07T06:13:52.9584248Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-09-07T06:13:52.9585835Z * [new branch] gh/guilhermeleobas/194/base -> origin/gh/guilhermeleobas/194/base 2025-09-07T06:13:52.9586983Z * [new branch] gh/guilhermeleobas/194/head -> origin/gh/guilhermeleobas/194/head 2025-09-07T06:13:52.9588091Z * [new branch] gh/guilhermeleobas/194/orig -> origin/gh/guilhermeleobas/194/orig 2025-09-07T06:13:52.9589741Z * [new branch] gh/guilhermeleobas/203/base -> origin/gh/guilhermeleobas/203/base 2025-09-07T06:13:52.9590805Z * [new branch] gh/guilhermeleobas/203/head -> origin/gh/guilhermeleobas/203/head 2025-09-07T06:13:52.9591878Z * [new branch] gh/guilhermeleobas/203/orig -> origin/gh/guilhermeleobas/203/orig 2025-09-07T06:13:52.9593354Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-09-07T06:13:52.9594485Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-09-07T06:13:52.9595594Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-09-07T06:13:52.9597222Z * [new branch] gh/guilhermeleobas/205/base -> origin/gh/guilhermeleobas/205/base 2025-09-07T06:13:52.9598308Z * [new branch] gh/guilhermeleobas/205/head -> origin/gh/guilhermeleobas/205/head 2025-09-07T06:13:52.9599483Z * [new branch] gh/guilhermeleobas/205/orig -> origin/gh/guilhermeleobas/205/orig 2025-09-07T06:13:52.9601059Z * [new branch] gh/guilhermeleobas/209/base -> origin/gh/guilhermeleobas/209/base 2025-09-07T06:13:52.9602196Z * [new branch] gh/guilhermeleobas/209/head -> origin/gh/guilhermeleobas/209/head 2025-09-07T06:13:52.9603346Z * [new branch] gh/guilhermeleobas/209/orig -> origin/gh/guilhermeleobas/209/orig 2025-09-07T06:13:52.9604925Z * [new branch] gh/guilhermeleobas/210/base -> origin/gh/guilhermeleobas/210/base 2025-09-07T06:13:52.9606056Z * [new branch] gh/guilhermeleobas/210/head -> origin/gh/guilhermeleobas/210/head 2025-09-07T06:13:52.9607188Z * [new branch] gh/guilhermeleobas/210/orig -> origin/gh/guilhermeleobas/210/orig 2025-09-07T06:13:52.9608765Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-09-07T06:13:52.9609887Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-09-07T06:13:52.9610973Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-09-07T06:13:52.9612963Z * [new branch] gh/guilhermeleobas/214/base -> origin/gh/guilhermeleobas/214/base 2025-09-07T06:13:52.9614066Z * [new branch] gh/guilhermeleobas/214/head -> origin/gh/guilhermeleobas/214/head 2025-09-07T06:13:52.9615222Z * [new branch] gh/guilhermeleobas/214/orig -> origin/gh/guilhermeleobas/214/orig 2025-09-07T06:13:52.9616861Z * [new branch] gh/guilhermeleobas/215/base -> origin/gh/guilhermeleobas/215/base 2025-09-07T06:13:52.9618108Z * [new branch] gh/guilhermeleobas/215/head -> origin/gh/guilhermeleobas/215/head 2025-09-07T06:13:52.9619260Z * [new branch] gh/guilhermeleobas/215/orig -> origin/gh/guilhermeleobas/215/orig 2025-09-07T06:13:52.9620827Z * [new branch] gh/guilhermeleobas/216/base -> origin/gh/guilhermeleobas/216/base 2025-09-07T06:13:52.9621979Z * [new branch] gh/guilhermeleobas/216/head -> origin/gh/guilhermeleobas/216/head 2025-09-07T06:13:52.9623161Z * [new branch] gh/guilhermeleobas/216/orig -> origin/gh/guilhermeleobas/216/orig 2025-09-07T06:13:52.9624865Z * [new branch] gh/guilhermeleobas/217/base -> origin/gh/guilhermeleobas/217/base 2025-09-07T06:13:52.9626098Z * [new branch] gh/guilhermeleobas/217/head -> origin/gh/guilhermeleobas/217/head 2025-09-07T06:13:52.9627257Z * [new branch] gh/guilhermeleobas/217/orig -> origin/gh/guilhermeleobas/217/orig 2025-09-07T06:13:52.9628817Z * [new branch] gh/guilhermeleobas/219/base -> origin/gh/guilhermeleobas/219/base 2025-09-07T06:13:52.9629986Z * [new branch] gh/guilhermeleobas/219/head -> origin/gh/guilhermeleobas/219/head 2025-09-07T06:13:52.9631126Z * [new branch] gh/guilhermeleobas/219/orig -> origin/gh/guilhermeleobas/219/orig 2025-09-07T06:13:52.9632804Z * [new branch] gh/guilhermeleobas/220/base -> origin/gh/guilhermeleobas/220/base 2025-09-07T06:13:52.9633767Z * [new branch] gh/guilhermeleobas/220/head -> origin/gh/guilhermeleobas/220/head 2025-09-07T06:13:52.9634920Z * [new branch] gh/guilhermeleobas/220/orig -> origin/gh/guilhermeleobas/220/orig 2025-09-07T06:13:52.9636503Z * [new branch] gh/guilhermeleobas/221/base -> origin/gh/guilhermeleobas/221/base 2025-09-07T06:13:52.9637622Z * [new branch] gh/guilhermeleobas/221/head -> origin/gh/guilhermeleobas/221/head 2025-09-07T06:13:52.9638735Z * [new branch] gh/guilhermeleobas/221/orig -> origin/gh/guilhermeleobas/221/orig 2025-09-07T06:13:52.9640313Z * [new branch] gh/guilhermeleobas/222/base -> origin/gh/guilhermeleobas/222/base 2025-09-07T06:13:52.9641451Z * [new branch] gh/guilhermeleobas/222/head -> origin/gh/guilhermeleobas/222/head 2025-09-07T06:13:52.9642631Z * [new branch] gh/guilhermeleobas/222/orig -> origin/gh/guilhermeleobas/222/orig 2025-09-07T06:13:52.9644173Z * [new branch] gh/guilhermeleobas/223/base -> origin/gh/guilhermeleobas/223/base 2025-09-07T06:13:52.9645363Z * [new branch] gh/guilhermeleobas/223/head -> origin/gh/guilhermeleobas/223/head 2025-09-07T06:13:52.9646563Z * [new branch] gh/guilhermeleobas/223/orig -> origin/gh/guilhermeleobas/223/orig 2025-09-07T06:13:52.9648112Z * [new branch] gh/guilhermeleobas/224/base -> origin/gh/guilhermeleobas/224/base 2025-09-07T06:13:52.9649597Z * [new branch] gh/guilhermeleobas/224/head -> origin/gh/guilhermeleobas/224/head 2025-09-07T06:13:52.9650821Z * [new branch] gh/guilhermeleobas/224/orig -> origin/gh/guilhermeleobas/224/orig 2025-09-07T06:13:52.9652592Z * [new branch] gh/guilhermeleobas/225/base -> origin/gh/guilhermeleobas/225/base 2025-09-07T06:13:52.9653757Z * [new branch] gh/guilhermeleobas/225/head -> origin/gh/guilhermeleobas/225/head 2025-09-07T06:13:52.9654912Z * [new branch] gh/guilhermeleobas/225/orig -> origin/gh/guilhermeleobas/225/orig 2025-09-07T06:13:52.9656417Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-09-07T06:13:52.9657614Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-09-07T06:13:52.9658798Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-09-07T06:13:52.9660421Z * [new branch] gh/guilhermeleobas/227/base -> origin/gh/guilhermeleobas/227/base 2025-09-07T06:13:52.9661619Z * [new branch] gh/guilhermeleobas/227/head -> origin/gh/guilhermeleobas/227/head 2025-09-07T06:13:52.9662833Z * [new branch] gh/guilhermeleobas/227/orig -> origin/gh/guilhermeleobas/227/orig 2025-09-07T06:13:52.9664531Z * [new branch] gh/guilhermeleobas/228/base -> origin/gh/guilhermeleobas/228/base 2025-09-07T06:13:52.9665637Z * [new branch] gh/guilhermeleobas/228/head -> origin/gh/guilhermeleobas/228/head 2025-09-07T06:13:52.9666685Z * [new branch] gh/guilhermeleobas/228/orig -> origin/gh/guilhermeleobas/228/orig 2025-09-07T06:13:52.9668203Z * [new branch] gh/guilhermeleobas/229/base -> origin/gh/guilhermeleobas/229/base 2025-09-07T06:13:52.9669350Z * [new branch] gh/guilhermeleobas/229/head -> origin/gh/guilhermeleobas/229/head 2025-09-07T06:13:52.9670494Z * [new branch] gh/guilhermeleobas/229/orig -> origin/gh/guilhermeleobas/229/orig 2025-09-07T06:13:52.9672090Z * [new branch] gh/guilhermeleobas/230/base -> origin/gh/guilhermeleobas/230/base 2025-09-07T06:13:52.9673226Z * [new branch] gh/guilhermeleobas/230/head -> origin/gh/guilhermeleobas/230/head 2025-09-07T06:13:52.9674354Z * [new branch] gh/guilhermeleobas/230/orig -> origin/gh/guilhermeleobas/230/orig 2025-09-07T06:13:52.9676014Z * [new branch] gh/guilhermeleobas/231/base -> origin/gh/guilhermeleobas/231/base 2025-09-07T06:13:52.9676978Z * [new branch] gh/guilhermeleobas/231/head -> origin/gh/guilhermeleobas/231/head 2025-09-07T06:13:52.9678143Z * [new branch] gh/guilhermeleobas/231/orig -> origin/gh/guilhermeleobas/231/orig 2025-09-07T06:13:52.9679693Z * [new branch] gh/guilhermeleobas/232/base -> origin/gh/guilhermeleobas/232/base 2025-09-07T06:13:52.9680812Z * [new branch] gh/guilhermeleobas/232/head -> origin/gh/guilhermeleobas/232/head 2025-09-07T06:13:52.9681986Z * [new branch] gh/guilhermeleobas/232/orig -> origin/gh/guilhermeleobas/232/orig 2025-09-07T06:13:52.9683542Z * [new branch] gh/guilhermeleobas/233/base -> origin/gh/guilhermeleobas/233/base 2025-09-07T06:13:52.9684586Z * [new branch] gh/guilhermeleobas/233/head -> origin/gh/guilhermeleobas/233/head 2025-09-07T06:13:52.9685786Z * [new branch] gh/guilhermeleobas/233/orig -> origin/gh/guilhermeleobas/233/orig 2025-09-07T06:13:52.9687365Z * [new branch] gh/guilhermeleobas/234/base -> origin/gh/guilhermeleobas/234/base 2025-09-07T06:13:52.9688962Z * [new branch] gh/guilhermeleobas/234/head -> origin/gh/guilhermeleobas/234/head 2025-09-07T06:13:52.9690097Z * [new branch] gh/guilhermeleobas/234/orig -> origin/gh/guilhermeleobas/234/orig 2025-09-07T06:13:52.9691939Z * [new branch] gh/guilhermeleobas/235/base -> origin/gh/guilhermeleobas/235/base 2025-09-07T06:13:52.9693297Z * [new branch] gh/guilhermeleobas/235/head -> origin/gh/guilhermeleobas/235/head 2025-09-07T06:13:52.9694548Z * [new branch] gh/guilhermeleobas/235/orig -> origin/gh/guilhermeleobas/235/orig 2025-09-07T06:13:52.9696213Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-09-07T06:13:52.9697374Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-09-07T06:13:52.9698511Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-09-07T06:13:52.9700119Z * [new branch] gh/guilhermeleobas/237/base -> origin/gh/guilhermeleobas/237/base 2025-09-07T06:13:52.9701272Z * [new branch] gh/guilhermeleobas/237/head -> origin/gh/guilhermeleobas/237/head 2025-09-07T06:13:52.9702420Z * [new branch] gh/guilhermeleobas/237/orig -> origin/gh/guilhermeleobas/237/orig 2025-09-07T06:13:52.9704211Z * [new branch] gh/guilhermeleobas/238/base -> origin/gh/guilhermeleobas/238/base 2025-09-07T06:13:52.9705319Z * [new branch] gh/guilhermeleobas/238/head -> origin/gh/guilhermeleobas/238/head 2025-09-07T06:13:52.9706429Z * [new branch] gh/guilhermeleobas/238/orig -> origin/gh/guilhermeleobas/238/orig 2025-09-07T06:13:52.9707989Z * [new branch] gh/guilhermeleobas/239/base -> origin/gh/guilhermeleobas/239/base 2025-09-07T06:13:52.9709094Z * [new branch] gh/guilhermeleobas/239/head -> origin/gh/guilhermeleobas/239/head 2025-09-07T06:13:52.9710275Z * [new branch] gh/guilhermeleobas/239/orig -> origin/gh/guilhermeleobas/239/orig 2025-09-07T06:13:52.9711924Z * [new branch] gh/guilhermeleobas/240/base -> origin/gh/guilhermeleobas/240/base 2025-09-07T06:13:52.9713019Z * [new branch] gh/guilhermeleobas/240/head -> origin/gh/guilhermeleobas/240/head 2025-09-07T06:13:52.9714153Z * [new branch] gh/guilhermeleobas/240/orig -> origin/gh/guilhermeleobas/240/orig 2025-09-07T06:13:52.9715751Z * [new branch] gh/guilhermeleobas/241/base -> origin/gh/guilhermeleobas/241/base 2025-09-07T06:13:52.9716899Z * [new branch] gh/guilhermeleobas/241/head -> origin/gh/guilhermeleobas/241/head 2025-09-07T06:13:52.9718043Z * [new branch] gh/guilhermeleobas/241/orig -> origin/gh/guilhermeleobas/241/orig 2025-09-07T06:13:52.9719709Z * [new branch] gh/guilhermeleobas/242/base -> origin/gh/guilhermeleobas/242/base 2025-09-07T06:13:52.9720814Z * [new branch] gh/guilhermeleobas/242/head -> origin/gh/guilhermeleobas/242/head 2025-09-07T06:13:52.9721891Z * [new branch] gh/guilhermeleobas/242/orig -> origin/gh/guilhermeleobas/242/orig 2025-09-07T06:13:52.9723377Z * [new branch] gh/guilhermeleobas/243/base -> origin/gh/guilhermeleobas/243/base 2025-09-07T06:13:52.9724527Z * [new branch] gh/guilhermeleobas/243/head -> origin/gh/guilhermeleobas/243/head 2025-09-07T06:13:52.9725696Z * [new branch] gh/guilhermeleobas/243/orig -> origin/gh/guilhermeleobas/243/orig 2025-09-07T06:13:52.9727244Z * [new branch] gh/guilhermeleobas/244/base -> origin/gh/guilhermeleobas/244/base 2025-09-07T06:13:52.9728373Z * [new branch] gh/guilhermeleobas/244/head -> origin/gh/guilhermeleobas/244/head 2025-09-07T06:13:52.9729500Z * [new branch] gh/guilhermeleobas/244/orig -> origin/gh/guilhermeleobas/244/orig 2025-09-07T06:13:52.9730997Z * [new branch] gh/guilhermeleobas/245/base -> origin/gh/guilhermeleobas/245/base 2025-09-07T06:13:52.9732412Z * [new branch] gh/guilhermeleobas/245/head -> origin/gh/guilhermeleobas/245/head 2025-09-07T06:13:52.9733674Z * [new branch] gh/guilhermeleobas/245/orig -> origin/gh/guilhermeleobas/245/orig 2025-09-07T06:13:52.9735341Z * [new branch] gh/guilhermeleobas/73/base -> origin/gh/guilhermeleobas/73/base 2025-09-07T06:13:52.9736501Z * [new branch] gh/guilhermeleobas/73/head -> origin/gh/guilhermeleobas/73/head 2025-09-07T06:13:52.9737645Z * [new branch] gh/guilhermeleobas/73/orig -> origin/gh/guilhermeleobas/73/orig 2025-09-07T06:13:52.9739579Z * [new branch] gh/henrylhtsang/140/base -> origin/gh/henrylhtsang/140/base 2025-09-07T06:13:52.9740843Z * [new branch] gh/henrylhtsang/140/head -> origin/gh/henrylhtsang/140/head 2025-09-07T06:13:52.9741988Z * [new branch] gh/henrylhtsang/140/orig -> origin/gh/henrylhtsang/140/orig 2025-09-07T06:13:52.9743469Z * [new branch] gh/henrylhtsang/141/base -> origin/gh/henrylhtsang/141/base 2025-09-07T06:13:52.9744703Z * [new branch] gh/henrylhtsang/141/head -> origin/gh/henrylhtsang/141/head 2025-09-07T06:13:52.9745796Z * [new branch] gh/henrylhtsang/141/orig -> origin/gh/henrylhtsang/141/orig 2025-09-07T06:13:52.9747645Z * [new branch] gh/henrylhtsang/142/base -> origin/gh/henrylhtsang/142/base 2025-09-07T06:13:52.9749104Z * [new branch] gh/henrylhtsang/142/head -> origin/gh/henrylhtsang/142/head 2025-09-07T06:13:52.9750609Z * [new branch] gh/henrylhtsang/142/orig -> origin/gh/henrylhtsang/142/orig 2025-09-07T06:13:52.9752178Z * [new branch] gh/henrylhtsang/143/base -> origin/gh/henrylhtsang/143/base 2025-09-07T06:13:52.9753338Z * [new branch] gh/henrylhtsang/143/head -> origin/gh/henrylhtsang/143/head 2025-09-07T06:13:52.9754475Z * [new branch] gh/henrylhtsang/143/orig -> origin/gh/henrylhtsang/143/orig 2025-09-07T06:13:52.9756155Z * [new branch] gh/henrylhtsang/144/base -> origin/gh/henrylhtsang/144/base 2025-09-07T06:13:52.9757306Z * [new branch] gh/henrylhtsang/144/head -> origin/gh/henrylhtsang/144/head 2025-09-07T06:13:52.9758488Z * [new branch] gh/henrylhtsang/144/orig -> origin/gh/henrylhtsang/144/orig 2025-09-07T06:13:52.9760113Z * [new branch] gh/henrylhtsang/145/base -> origin/gh/henrylhtsang/145/base 2025-09-07T06:13:52.9761537Z * [new branch] gh/henrylhtsang/145/head -> origin/gh/henrylhtsang/145/head 2025-09-07T06:13:52.9762690Z * [new branch] gh/henrylhtsang/145/orig -> origin/gh/henrylhtsang/145/orig 2025-09-07T06:13:52.9764378Z * [new branch] gh/henrylhtsang/146/base -> origin/gh/henrylhtsang/146/base 2025-09-07T06:13:52.9765647Z * [new branch] gh/henrylhtsang/146/head -> origin/gh/henrylhtsang/146/head 2025-09-07T06:13:52.9766843Z * [new branch] gh/henrylhtsang/146/orig -> origin/gh/henrylhtsang/146/orig 2025-09-07T06:13:52.9768232Z * [new branch] gh/henrylhtsang/147/base -> origin/gh/henrylhtsang/147/base 2025-09-07T06:13:52.9769309Z * [new branch] gh/henrylhtsang/147/head -> origin/gh/henrylhtsang/147/head 2025-09-07T06:13:52.9770421Z * [new branch] gh/henrylhtsang/147/orig -> origin/gh/henrylhtsang/147/orig 2025-09-07T06:13:52.9772593Z * [new branch] gh/henrylhtsang/148/base -> origin/gh/henrylhtsang/148/base 2025-09-07T06:13:52.9773971Z * [new branch] gh/henrylhtsang/148/head -> origin/gh/henrylhtsang/148/head 2025-09-07T06:13:52.9775133Z * [new branch] gh/henrylhtsang/148/orig -> origin/gh/henrylhtsang/148/orig 2025-09-07T06:13:52.9776777Z * [new branch] gh/henrylhtsang/149/base -> origin/gh/henrylhtsang/149/base 2025-09-07T06:13:52.9778002Z * [new branch] gh/henrylhtsang/149/head -> origin/gh/henrylhtsang/149/head 2025-09-07T06:13:52.9779172Z * [new branch] gh/henrylhtsang/149/orig -> origin/gh/henrylhtsang/149/orig 2025-09-07T06:13:52.9781030Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-09-07T06:13:52.9782518Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-09-07T06:13:52.9784123Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-09-07T06:13:52.9785648Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-09-07T06:13:52.9787135Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-09-07T06:13:52.9788604Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-09-07T06:13:52.9790521Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-09-07T06:13:52.9791645Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-09-07T06:13:52.9793539Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-09-07T06:13:52.9794686Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-09-07T06:13:52.9796247Z * [new branch] gh/isuruf/141/base -> origin/gh/isuruf/141/base 2025-09-07T06:13:52.9797365Z * [new branch] gh/isuruf/141/head -> origin/gh/isuruf/141/head 2025-09-07T06:13:52.9798498Z * [new branch] gh/isuruf/141/orig -> origin/gh/isuruf/141/orig 2025-09-07T06:13:52.9799989Z * [new branch] gh/isuruf/142/base -> origin/gh/isuruf/142/base 2025-09-07T06:13:52.9801117Z * [new branch] gh/isuruf/142/head -> origin/gh/isuruf/142/head 2025-09-07T06:13:52.9802263Z * [new branch] gh/isuruf/142/orig -> origin/gh/isuruf/142/orig 2025-09-07T06:13:52.9803756Z * [new branch] gh/isuruf/143/base -> origin/gh/isuruf/143/base 2025-09-07T06:13:52.9804860Z * [new branch] gh/isuruf/143/head -> origin/gh/isuruf/143/head 2025-09-07T06:13:52.9805965Z * [new branch] gh/isuruf/143/orig -> origin/gh/isuruf/143/orig 2025-09-07T06:13:52.9807504Z * [new branch] gh/isuruf/144/base -> origin/gh/isuruf/144/base 2025-09-07T06:13:52.9808596Z * [new branch] gh/isuruf/144/head -> origin/gh/isuruf/144/head 2025-09-07T06:13:52.9809707Z * [new branch] gh/isuruf/144/orig -> origin/gh/isuruf/144/orig 2025-09-07T06:13:52.9811206Z * [new branch] gh/isuruf/145/base -> origin/gh/isuruf/145/base 2025-09-07T06:13:52.9812751Z * [new branch] gh/isuruf/145/head -> origin/gh/isuruf/145/head 2025-09-07T06:13:52.9813980Z * [new branch] gh/isuruf/145/orig -> origin/gh/isuruf/145/orig 2025-09-07T06:13:52.9815467Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-09-07T06:13:52.9816650Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-09-07T06:13:52.9817797Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-09-07T06:13:52.9819338Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-09-07T06:13:52.9820486Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-09-07T06:13:52.9821647Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-09-07T06:13:52.9823570Z * [new branch] gh/jamesjwu/150/base -> origin/gh/jamesjwu/150/base 2025-09-07T06:13:52.9824800Z * [new branch] gh/jamesjwu/150/head -> origin/gh/jamesjwu/150/head 2025-09-07T06:13:52.9825908Z * [new branch] gh/jamesjwu/150/orig -> origin/gh/jamesjwu/150/orig 2025-09-07T06:13:52.9827555Z * [new branch] gh/jamesjwu/154/base -> origin/gh/jamesjwu/154/base 2025-09-07T06:13:52.9828615Z * [new branch] gh/jamesjwu/154/head -> origin/gh/jamesjwu/154/head 2025-09-07T06:13:52.9829919Z * [new branch] gh/jamesjwu/154/orig -> origin/gh/jamesjwu/154/orig 2025-09-07T06:13:52.9831461Z * [new branch] gh/jamesjwu/155/base -> origin/gh/jamesjwu/155/base 2025-09-07T06:13:52.9832573Z * [new branch] gh/jamesjwu/155/head -> origin/gh/jamesjwu/155/head 2025-09-07T06:13:52.9833692Z * [new branch] gh/jamesjwu/155/orig -> origin/gh/jamesjwu/155/orig 2025-09-07T06:13:52.9835204Z * [new branch] gh/jamesjwu/159/base -> origin/gh/jamesjwu/159/base 2025-09-07T06:13:52.9836338Z * [new branch] gh/jamesjwu/159/head -> origin/gh/jamesjwu/159/head 2025-09-07T06:13:52.9837512Z * [new branch] gh/jamesjwu/159/orig -> origin/gh/jamesjwu/159/orig 2025-09-07T06:13:52.9839427Z * [new branch] gh/jamesjwu/163/base -> origin/gh/jamesjwu/163/base 2025-09-07T06:13:52.9840550Z * [new branch] gh/jamesjwu/163/head -> origin/gh/jamesjwu/163/head 2025-09-07T06:13:52.9841688Z * [new branch] gh/jamesjwu/163/orig -> origin/gh/jamesjwu/163/orig 2025-09-07T06:13:52.9843198Z * [new branch] gh/jamesjwu/171/base -> origin/gh/jamesjwu/171/base 2025-09-07T06:13:52.9844288Z * [new branch] gh/jamesjwu/171/head -> origin/gh/jamesjwu/171/head 2025-09-07T06:13:52.9845432Z * [new branch] gh/jamesjwu/171/orig -> origin/gh/jamesjwu/171/orig 2025-09-07T06:13:52.9846942Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-09-07T06:13:52.9848070Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-09-07T06:13:52.9849484Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-09-07T06:13:52.9851259Z * [new branch] gh/jamesjwu/181/base -> origin/gh/jamesjwu/181/base 2025-09-07T06:13:52.9852567Z * [new branch] gh/jamesjwu/181/head -> origin/gh/jamesjwu/181/head 2025-09-07T06:13:52.9853746Z * [new branch] gh/jamesjwu/181/orig -> origin/gh/jamesjwu/181/orig 2025-09-07T06:13:52.9855282Z * [new branch] gh/jamesjwu/182/base -> origin/gh/jamesjwu/182/base 2025-09-07T06:13:52.9856435Z * [new branch] gh/jamesjwu/182/head -> origin/gh/jamesjwu/182/head 2025-09-07T06:13:52.9857637Z * [new branch] gh/jamesjwu/182/orig -> origin/gh/jamesjwu/182/orig 2025-09-07T06:13:52.9859192Z * [new branch] gh/jamesjwu/183/base -> origin/gh/jamesjwu/183/base 2025-09-07T06:13:52.9860455Z * [new branch] gh/jamesjwu/183/head -> origin/gh/jamesjwu/183/head 2025-09-07T06:13:52.9861515Z * [new branch] gh/jamesjwu/183/orig -> origin/gh/jamesjwu/183/orig 2025-09-07T06:13:52.9863043Z * [new branch] gh/jamesjwu/184/base -> origin/gh/jamesjwu/184/base 2025-09-07T06:13:52.9864257Z * [new branch] gh/jamesjwu/184/head -> origin/gh/jamesjwu/184/head 2025-09-07T06:13:52.9865459Z * [new branch] gh/jamesjwu/184/orig -> origin/gh/jamesjwu/184/orig 2025-09-07T06:13:52.9866984Z * [new branch] gh/jamesjwu/185/base -> origin/gh/jamesjwu/185/base 2025-09-07T06:13:52.9868080Z * [new branch] gh/jamesjwu/185/head -> origin/gh/jamesjwu/185/head 2025-09-07T06:13:52.9869266Z * [new branch] gh/jamesjwu/185/orig -> origin/gh/jamesjwu/185/orig 2025-09-07T06:13:52.9870741Z * [new branch] gh/jamesjwu/186/base -> origin/gh/jamesjwu/186/base 2025-09-07T06:13:52.9872047Z * [new branch] gh/jamesjwu/186/head -> origin/gh/jamesjwu/186/head 2025-09-07T06:13:52.9872923Z * [new branch] gh/jamesjwu/186/orig -> origin/gh/jamesjwu/186/orig 2025-09-07T06:13:52.9874434Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-09-07T06:13:52.9875533Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-09-07T06:13:52.9876624Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-09-07T06:13:52.9878202Z * [new branch] gh/jamesjwu/188/base -> origin/gh/jamesjwu/188/base 2025-09-07T06:13:52.9879309Z * [new branch] gh/jamesjwu/188/head -> origin/gh/jamesjwu/188/head 2025-09-07T06:13:52.9880441Z * [new branch] gh/jamesjwu/188/orig -> origin/gh/jamesjwu/188/orig 2025-09-07T06:13:52.9881892Z * [new branch] gh/jamesjwu/189/base -> origin/gh/jamesjwu/189/base 2025-09-07T06:13:52.9883090Z * [new branch] gh/jamesjwu/189/head -> origin/gh/jamesjwu/189/head 2025-09-07T06:13:52.9884157Z * [new branch] gh/jamesjwu/189/orig -> origin/gh/jamesjwu/189/orig 2025-09-07T06:13:52.9886072Z * [new branch] gh/jamesjwu/190/base -> origin/gh/jamesjwu/190/base 2025-09-07T06:13:52.9887221Z * [new branch] gh/jamesjwu/190/head -> origin/gh/jamesjwu/190/head 2025-09-07T06:13:52.9888414Z * [new branch] gh/jamesjwu/190/orig -> origin/gh/jamesjwu/190/orig 2025-09-07T06:13:52.9890055Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-09-07T06:13:52.9891196Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-09-07T06:13:52.9893083Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-09-07T06:13:52.9894271Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-09-07T06:13:52.9895717Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-09-07T06:13:52.9896936Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-09-07T06:13:52.9898402Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-09-07T06:13:52.9899495Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-09-07T06:13:52.9900924Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-09-07T06:13:52.9902033Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-09-07T06:13:52.9903501Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-09-07T06:13:52.9904802Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-09-07T06:13:52.9906307Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-09-07T06:13:52.9907331Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-09-07T06:13:52.9908738Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-09-07T06:13:52.9909821Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-09-07T06:13:52.9911221Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-09-07T06:13:52.9912305Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-09-07T06:13:52.9913744Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-09-07T06:13:52.9914891Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-09-07T06:13:52.9916330Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-09-07T06:13:52.9917359Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-09-07T06:13:52.9918774Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-09-07T06:13:52.9919951Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-09-07T06:13:52.9921587Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-09-07T06:13:52.9922719Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-09-07T06:13:52.9924184Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-09-07T06:13:52.9925211Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-09-07T06:13:52.9927134Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-09-07T06:13:52.9928395Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-09-07T06:13:52.9929558Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-09-07T06:13:52.9930955Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-09-07T06:13:52.9932389Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-09-07T06:13:52.9933541Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-09-07T06:13:52.9935463Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-09-07T06:13:52.9936605Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-09-07T06:13:52.9937766Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-09-07T06:13:52.9939433Z * [new branch] gh/janeyx99/296/base -> origin/gh/janeyx99/296/base 2025-09-07T06:13:52.9940584Z * [new branch] gh/janeyx99/296/head -> origin/gh/janeyx99/296/head 2025-09-07T06:13:52.9941735Z * [new branch] gh/janeyx99/296/orig -> origin/gh/janeyx99/296/orig 2025-09-07T06:13:52.9943345Z * [new branch] gh/janeyx99/297/base -> origin/gh/janeyx99/297/base 2025-09-07T06:13:52.9944587Z * [new branch] gh/janeyx99/297/head -> origin/gh/janeyx99/297/head 2025-09-07T06:13:52.9945712Z * [new branch] gh/janeyx99/297/orig -> origin/gh/janeyx99/297/orig 2025-09-07T06:13:52.9947222Z * [new branch] gh/janeyx99/298/base -> origin/gh/janeyx99/298/base 2025-09-07T06:13:52.9948314Z * [new branch] gh/janeyx99/298/head -> origin/gh/janeyx99/298/head 2025-09-07T06:13:52.9950789Z * [new branch] gh/janeyx99/298/orig -> origin/gh/janeyx99/298/orig 2025-09-07T06:13:52.9952701Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-09-07T06:13:52.9954139Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-09-07T06:13:52.9954999Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-09-07T06:13:52.9956832Z * [new branch] gh/janeyx99/300/base -> origin/gh/janeyx99/300/base 2025-09-07T06:13:52.9958204Z * [new branch] gh/janeyx99/300/head -> origin/gh/janeyx99/300/head 2025-09-07T06:13:52.9959388Z * [new branch] gh/janeyx99/300/orig -> origin/gh/janeyx99/300/orig 2025-09-07T06:13:52.9960969Z * [new branch] gh/janeyx99/301/base -> origin/gh/janeyx99/301/base 2025-09-07T06:13:52.9962251Z * [new branch] gh/janeyx99/301/head -> origin/gh/janeyx99/301/head 2025-09-07T06:13:52.9963398Z * [new branch] gh/janeyx99/301/orig -> origin/gh/janeyx99/301/orig 2025-09-07T06:13:52.9964755Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-09-07T06:13:52.9966062Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-09-07T06:13:52.9967466Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-09-07T06:13:52.9968606Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-09-07T06:13:52.9970213Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-09-07T06:13:52.9971413Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-09-07T06:13:52.9972838Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-09-07T06:13:52.9974753Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-09-07T06:13:52.9975851Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-09-07T06:13:52.9977400Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-09-07T06:13:52.9978550Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-09-07T06:13:52.9979717Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-09-07T06:13:52.9981255Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-09-07T06:13:52.9982413Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-09-07T06:13:52.9983545Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-09-07T06:13:52.9985185Z * [new branch] gh/jansel/531/base -> origin/gh/jansel/531/base 2025-09-07T06:13:52.9986337Z * [new branch] gh/jansel/531/head -> origin/gh/jansel/531/head 2025-09-07T06:13:52.9987428Z * [new branch] gh/jansel/531/orig -> origin/gh/jansel/531/orig 2025-09-07T06:13:52.9989423Z * [new branch] gh/jbschlosser/208/head -> origin/gh/jbschlosser/208/head 2025-09-07T06:13:52.9991025Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-09-07T06:13:52.9992606Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-09-07T06:13:52.9993752Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-09-07T06:13:52.9995377Z * [new branch] gh/jbschlosser/248/base -> origin/gh/jbschlosser/248/base 2025-09-07T06:13:52.9996515Z * [new branch] gh/jbschlosser/248/head -> origin/gh/jbschlosser/248/head 2025-09-07T06:13:52.9997650Z * [new branch] gh/jbschlosser/248/orig -> origin/gh/jbschlosser/248/orig 2025-09-07T06:13:52.9999296Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-09-07T06:13:53.0000428Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-09-07T06:13:53.0001639Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-09-07T06:13:53.0003399Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-09-07T06:13:53.0004527Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-09-07T06:13:53.0005705Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-09-07T06:13:53.0007156Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-09-07T06:13:53.0008320Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-09-07T06:13:53.0009442Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-09-07T06:13:53.0010940Z * [new branch] gh/jiayisunx/64/base -> origin/gh/jiayisunx/64/base 2025-09-07T06:13:53.0012368Z * [new branch] gh/jiayisunx/64/head -> origin/gh/jiayisunx/64/head 2025-09-07T06:13:53.0013534Z * [new branch] gh/jiayisunx/64/orig -> origin/gh/jiayisunx/64/orig 2025-09-07T06:13:53.0015083Z * [new branch] gh/jiayisunx/65/base -> origin/gh/jiayisunx/65/base 2025-09-07T06:13:53.0016293Z * [new branch] gh/jiayisunx/65/head -> origin/gh/jiayisunx/65/head 2025-09-07T06:13:53.0017424Z * [new branch] gh/jiayisunx/65/orig -> origin/gh/jiayisunx/65/orig 2025-09-07T06:13:53.0018977Z * [new branch] gh/jiayisunx/66/base -> origin/gh/jiayisunx/66/base 2025-09-07T06:13:53.0020162Z * [new branch] gh/jiayisunx/66/head -> origin/gh/jiayisunx/66/head 2025-09-07T06:13:53.0021296Z * [new branch] gh/jiayisunx/66/orig -> origin/gh/jiayisunx/66/orig 2025-09-07T06:13:53.0022827Z * [new branch] gh/jiayisunx/67/base -> origin/gh/jiayisunx/67/base 2025-09-07T06:13:53.0024072Z * [new branch] gh/jiayisunx/67/head -> origin/gh/jiayisunx/67/head 2025-09-07T06:13:53.0025196Z * [new branch] gh/jiayisunx/67/orig -> origin/gh/jiayisunx/67/orig 2025-09-07T06:13:53.0026723Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-09-07T06:13:53.0027809Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-09-07T06:13:53.0028923Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-09-07T06:13:53.0030472Z * [new branch] gh/jiayisunx/69/base -> origin/gh/jiayisunx/69/base 2025-09-07T06:13:53.0031616Z * [new branch] gh/jiayisunx/69/head -> origin/gh/jiayisunx/69/head 2025-09-07T06:13:53.0032714Z * [new branch] gh/jiayisunx/69/orig -> origin/gh/jiayisunx/69/orig 2025-09-07T06:13:53.0034315Z * [new branch] gh/jiayisunx/70/base -> origin/gh/jiayisunx/70/base 2025-09-07T06:13:53.0035446Z * [new branch] gh/jiayisunx/70/head -> origin/gh/jiayisunx/70/head 2025-09-07T06:13:53.0036576Z * [new branch] gh/jiayisunx/70/orig -> origin/gh/jiayisunx/70/orig 2025-09-07T06:13:53.0038041Z * [new branch] gh/jiayisunx/71/base -> origin/gh/jiayisunx/71/base 2025-09-07T06:13:53.0039148Z * [new branch] gh/jiayisunx/71/head -> origin/gh/jiayisunx/71/head 2025-09-07T06:13:53.0040263Z * [new branch] gh/jiayisunx/71/orig -> origin/gh/jiayisunx/71/orig 2025-09-07T06:13:53.0041805Z * [new branch] gh/jiayisunx/72/base -> origin/gh/jiayisunx/72/base 2025-09-07T06:13:53.0042868Z * [new branch] gh/jiayisunx/72/head -> origin/gh/jiayisunx/72/head 2025-09-07T06:13:53.0043981Z * [new branch] gh/jiayisunx/72/orig -> origin/gh/jiayisunx/72/orig 2025-09-07T06:13:53.0045611Z * [new branch] gh/jiayisunx/73/base -> origin/gh/jiayisunx/73/base 2025-09-07T06:13:53.0046831Z * [new branch] gh/jiayisunx/73/head -> origin/gh/jiayisunx/73/head 2025-09-07T06:13:53.0047894Z * [new branch] gh/jiayisunx/73/orig -> origin/gh/jiayisunx/73/orig 2025-09-07T06:13:53.0049825Z * [new branch] gh/jiayisunx/74/base -> origin/gh/jiayisunx/74/base 2025-09-07T06:13:53.0050986Z * [new branch] gh/jiayisunx/74/head -> origin/gh/jiayisunx/74/head 2025-09-07T06:13:53.0052430Z * [new branch] gh/jiayisunx/74/orig -> origin/gh/jiayisunx/74/orig 2025-09-07T06:13:53.0053973Z * [new branch] gh/jiayisunx/75/base -> origin/gh/jiayisunx/75/base 2025-09-07T06:13:53.0054903Z * [new branch] gh/jiayisunx/75/head -> origin/gh/jiayisunx/75/head 2025-09-07T06:13:53.0056045Z * [new branch] gh/jiayisunx/75/orig -> origin/gh/jiayisunx/75/orig 2025-09-07T06:13:53.0057582Z * [new branch] gh/jiayisunx/76/base -> origin/gh/jiayisunx/76/base 2025-09-07T06:13:53.0058638Z * [new branch] gh/jiayisunx/76/head -> origin/gh/jiayisunx/76/head 2025-09-07T06:13:53.0059794Z * [new branch] gh/jiayisunx/76/orig -> origin/gh/jiayisunx/76/orig 2025-09-07T06:13:53.0061555Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-09-07T06:13:53.0062700Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-09-07T06:13:53.0064655Z * [new branch] gh/justinchuby/111/base -> origin/gh/justinchuby/111/base 2025-09-07T06:13:53.0065832Z * [new branch] gh/justinchuby/111/head -> origin/gh/justinchuby/111/head 2025-09-07T06:13:53.0067022Z * [new branch] gh/justinchuby/111/orig -> origin/gh/justinchuby/111/orig 2025-09-07T06:13:53.0068544Z * [new branch] gh/justinchuby/112/base -> origin/gh/justinchuby/112/base 2025-09-07T06:13:53.0069649Z * [new branch] gh/justinchuby/112/head -> origin/gh/justinchuby/112/head 2025-09-07T06:13:53.0070774Z * [new branch] gh/justinchuby/112/orig -> origin/gh/justinchuby/112/orig 2025-09-07T06:13:53.0072279Z * [new branch] gh/justinchuby/113/base -> origin/gh/justinchuby/113/base 2025-09-07T06:13:53.0073554Z * [new branch] gh/justinchuby/113/head -> origin/gh/justinchuby/113/head 2025-09-07T06:13:53.0074756Z * [new branch] gh/justinchuby/113/orig -> origin/gh/justinchuby/113/orig 2025-09-07T06:13:53.0076221Z * [new branch] gh/justinchuby/114/base -> origin/gh/justinchuby/114/base 2025-09-07T06:13:53.0077326Z * [new branch] gh/justinchuby/114/head -> origin/gh/justinchuby/114/head 2025-09-07T06:13:53.0078474Z * [new branch] gh/justinchuby/114/orig -> origin/gh/justinchuby/114/orig 2025-09-07T06:13:53.0079967Z * [new branch] gh/justinchuby/115/base -> origin/gh/justinchuby/115/base 2025-09-07T06:13:53.0081080Z * [new branch] gh/justinchuby/115/head -> origin/gh/justinchuby/115/head 2025-09-07T06:13:53.0082198Z * [new branch] gh/justinchuby/115/orig -> origin/gh/justinchuby/115/orig 2025-09-07T06:13:53.0084006Z * [new branch] gh/karthickai/1/base -> origin/gh/karthickai/1/base 2025-09-07T06:13:53.0085641Z * [new branch] gh/karthickai/1/head -> origin/gh/karthickai/1/head 2025-09-07T06:13:53.0086801Z * [new branch] gh/karthickai/1/orig -> origin/gh/karthickai/1/orig 2025-09-07T06:13:53.0088315Z * [new branch] gh/karthickai/2/base -> origin/gh/karthickai/2/base 2025-09-07T06:13:53.0089433Z * [new branch] gh/karthickai/2/head -> origin/gh/karthickai/2/head 2025-09-07T06:13:53.0090573Z * [new branch] gh/karthickai/2/orig -> origin/gh/karthickai/2/orig 2025-09-07T06:13:53.0092728Z * [new branch] gh/kurtamohler/32/base -> origin/gh/kurtamohler/32/base 2025-09-07T06:13:53.0093972Z * [new branch] gh/kurtamohler/32/head -> origin/gh/kurtamohler/32/head 2025-09-07T06:13:53.0095018Z * [new branch] gh/kurtamohler/32/orig -> origin/gh/kurtamohler/32/orig 2025-09-07T06:13:53.0096570Z * [new branch] gh/kurtamohler/33/base -> origin/gh/kurtamohler/33/base 2025-09-07T06:13:53.0097734Z * [new branch] gh/kurtamohler/33/head -> origin/gh/kurtamohler/33/head 2025-09-07T06:13:53.0098856Z * [new branch] gh/kurtamohler/33/orig -> origin/gh/kurtamohler/33/orig 2025-09-07T06:13:53.0100453Z * [new branch] gh/kurtamohler/34/base -> origin/gh/kurtamohler/34/base 2025-09-07T06:13:53.0101718Z * [new branch] gh/kurtamohler/34/head -> origin/gh/kurtamohler/34/head 2025-09-07T06:13:53.0102854Z * [new branch] gh/kurtamohler/34/orig -> origin/gh/kurtamohler/34/orig 2025-09-07T06:13:53.0104516Z * [new branch] gh/kurtamohler/41/base -> origin/gh/kurtamohler/41/base 2025-09-07T06:13:53.0105600Z * [new branch] gh/kurtamohler/41/head -> origin/gh/kurtamohler/41/head 2025-09-07T06:13:53.0106716Z * [new branch] gh/kurtamohler/41/orig -> origin/gh/kurtamohler/41/orig 2025-09-07T06:13:53.0108259Z * [new branch] gh/kurtamohler/46/base -> origin/gh/kurtamohler/46/base 2025-09-07T06:13:53.0109406Z * [new branch] gh/kurtamohler/46/head -> origin/gh/kurtamohler/46/head 2025-09-07T06:13:53.0110489Z * [new branch] gh/kurtamohler/46/orig -> origin/gh/kurtamohler/46/orig 2025-09-07T06:13:53.0112018Z * [new branch] gh/kurtamohler/47/base -> origin/gh/kurtamohler/47/base 2025-09-07T06:13:53.0113158Z * [new branch] gh/kurtamohler/47/head -> origin/gh/kurtamohler/47/head 2025-09-07T06:13:53.0114299Z * [new branch] gh/kurtamohler/47/orig -> origin/gh/kurtamohler/47/orig 2025-09-07T06:13:53.0115796Z * [new branch] gh/kurtamohler/48/base -> origin/gh/kurtamohler/48/base 2025-09-07T06:13:53.0116883Z * [new branch] gh/kurtamohler/48/head -> origin/gh/kurtamohler/48/head 2025-09-07T06:13:53.0117994Z * [new branch] gh/kurtamohler/48/orig -> origin/gh/kurtamohler/48/orig 2025-09-07T06:13:53.0119591Z * [new branch] gh/kurtamohler/49/base -> origin/gh/kurtamohler/49/base 2025-09-07T06:13:53.0120660Z * [new branch] gh/kurtamohler/49/head -> origin/gh/kurtamohler/49/head 2025-09-07T06:13:53.0121874Z * [new branch] gh/kurtamohler/49/orig -> origin/gh/kurtamohler/49/orig 2025-09-07T06:13:53.0123388Z * [new branch] gh/kurtamohler/50/base -> origin/gh/kurtamohler/50/base 2025-09-07T06:13:53.0124502Z * [new branch] gh/kurtamohler/50/head -> origin/gh/kurtamohler/50/head 2025-09-07T06:13:53.0125615Z * [new branch] gh/kurtamohler/50/orig -> origin/gh/kurtamohler/50/orig 2025-09-07T06:13:53.0127523Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-09-07T06:13:53.0128834Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-09-07T06:13:53.0130006Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-09-07T06:13:53.0131603Z * [new branch] gh/kwen2501/15/base -> origin/gh/kwen2501/15/base 2025-09-07T06:13:53.0133089Z * [new branch] gh/kwen2501/15/head -> origin/gh/kwen2501/15/head 2025-09-07T06:13:53.0134672Z * [new branch] gh/kwen2501/156/base -> origin/gh/kwen2501/156/base 2025-09-07T06:13:53.0135810Z * [new branch] gh/kwen2501/156/head -> origin/gh/kwen2501/156/head 2025-09-07T06:13:53.0136944Z * [new branch] gh/kwen2501/156/orig -> origin/gh/kwen2501/156/orig 2025-09-07T06:13:53.0138543Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-09-07T06:13:53.0139598Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-09-07T06:13:53.0141233Z * [new branch] gh/kwen2501/186/base -> origin/gh/kwen2501/186/base 2025-09-07T06:13:53.0142393Z * [new branch] gh/kwen2501/186/head -> origin/gh/kwen2501/186/head 2025-09-07T06:13:53.0143605Z * [new branch] gh/kwen2501/186/orig -> origin/gh/kwen2501/186/orig 2025-09-07T06:13:53.0145095Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-09-07T06:13:53.0146286Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-09-07T06:13:53.0147437Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-09-07T06:13:53.0149294Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-09-07T06:13:53.0150578Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-09-07T06:13:53.0151705Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-09-07T06:13:53.0153383Z * [new branch] gh/kwen2501/194/base -> origin/gh/kwen2501/194/base 2025-09-07T06:13:53.0154547Z * [new branch] gh/kwen2501/194/head -> origin/gh/kwen2501/194/head 2025-09-07T06:13:53.0155689Z * [new branch] gh/kwen2501/194/orig -> origin/gh/kwen2501/194/orig 2025-09-07T06:13:53.0157235Z * [new branch] gh/kwen2501/199/base -> origin/gh/kwen2501/199/base 2025-09-07T06:13:53.0158369Z * [new branch] gh/kwen2501/199/head -> origin/gh/kwen2501/199/head 2025-09-07T06:13:53.0159531Z * [new branch] gh/kwen2501/199/orig -> origin/gh/kwen2501/199/orig 2025-09-07T06:13:53.0161048Z * [new branch] gh/kwen2501/200/base -> origin/gh/kwen2501/200/base 2025-09-07T06:13:53.0162341Z * [new branch] gh/kwen2501/200/head -> origin/gh/kwen2501/200/head 2025-09-07T06:13:53.0163435Z * [new branch] gh/kwen2501/200/orig -> origin/gh/kwen2501/200/orig 2025-09-07T06:13:53.0164971Z * [new branch] gh/kwen2501/201/base -> origin/gh/kwen2501/201/base 2025-09-07T06:13:53.0166059Z * [new branch] gh/kwen2501/201/head -> origin/gh/kwen2501/201/head 2025-09-07T06:13:53.0167337Z * [new branch] gh/kwen2501/201/orig -> origin/gh/kwen2501/201/orig 2025-09-07T06:13:53.0168826Z * [new branch] gh/kwen2501/203/base -> origin/gh/kwen2501/203/base 2025-09-07T06:13:53.0169931Z * [new branch] gh/kwen2501/203/head -> origin/gh/kwen2501/203/head 2025-09-07T06:13:53.0171050Z * [new branch] gh/kwen2501/203/orig -> origin/gh/kwen2501/203/orig 2025-09-07T06:13:53.0172881Z * [new branch] gh/kwen2501/204/base -> origin/gh/kwen2501/204/base 2025-09-07T06:13:53.0174009Z * [new branch] gh/kwen2501/204/head -> origin/gh/kwen2501/204/head 2025-09-07T06:13:53.0175160Z * [new branch] gh/kwen2501/204/orig -> origin/gh/kwen2501/204/orig 2025-09-07T06:13:53.0176714Z * [new branch] gh/kwen2501/205/base -> origin/gh/kwen2501/205/base 2025-09-07T06:13:53.0177816Z * [new branch] gh/kwen2501/205/head -> origin/gh/kwen2501/205/head 2025-09-07T06:13:53.0178990Z * [new branch] gh/kwen2501/205/orig -> origin/gh/kwen2501/205/orig 2025-09-07T06:13:53.0180608Z * [new branch] gh/kwen2501/206/base -> origin/gh/kwen2501/206/base 2025-09-07T06:13:53.0181754Z * [new branch] gh/kwen2501/206/head -> origin/gh/kwen2501/206/head 2025-09-07T06:13:53.0182929Z * [new branch] gh/kwen2501/206/orig -> origin/gh/kwen2501/206/orig 2025-09-07T06:13:53.0184645Z * [new branch] gh/kwen2501/207/base -> origin/gh/kwen2501/207/base 2025-09-07T06:13:53.0185625Z * [new branch] gh/kwen2501/207/head -> origin/gh/kwen2501/207/head 2025-09-07T06:13:53.0186773Z * [new branch] gh/kwen2501/207/orig -> origin/gh/kwen2501/207/orig 2025-09-07T06:13:53.0188257Z * [new branch] gh/kwen2501/208/base -> origin/gh/kwen2501/208/base 2025-09-07T06:13:53.0189360Z * [new branch] gh/kwen2501/208/head -> origin/gh/kwen2501/208/head 2025-09-07T06:13:53.0190451Z * [new branch] gh/kwen2501/208/orig -> origin/gh/kwen2501/208/orig 2025-09-07T06:13:53.0192393Z * [new branch] gh/kwen2501/209/base -> origin/gh/kwen2501/209/base 2025-09-07T06:13:53.0193547Z * [new branch] gh/kwen2501/209/head -> origin/gh/kwen2501/209/head 2025-09-07T06:13:53.0194657Z * [new branch] gh/kwen2501/209/orig -> origin/gh/kwen2501/209/orig 2025-09-07T06:13:53.0196257Z * [new branch] gh/kwen2501/210/base -> origin/gh/kwen2501/210/base 2025-09-07T06:13:53.0197379Z * [new branch] gh/kwen2501/210/head -> origin/gh/kwen2501/210/head 2025-09-07T06:13:53.0198567Z * [new branch] gh/kwen2501/210/orig -> origin/gh/kwen2501/210/orig 2025-09-07T06:13:53.0200100Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-09-07T06:13:53.0201224Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-09-07T06:13:53.0202724Z * [new branch] gh/kwen2501/212/base -> origin/gh/kwen2501/212/base 2025-09-07T06:13:53.0203832Z * [new branch] gh/kwen2501/212/head -> origin/gh/kwen2501/212/head 2025-09-07T06:13:53.0204948Z * [new branch] gh/kwen2501/212/orig -> origin/gh/kwen2501/212/orig 2025-09-07T06:13:53.0206464Z * [new branch] gh/kwen2501/213/base -> origin/gh/kwen2501/213/base 2025-09-07T06:13:53.0207631Z * [new branch] gh/kwen2501/213/head -> origin/gh/kwen2501/213/head 2025-09-07T06:13:53.0208768Z * [new branch] gh/kwen2501/213/orig -> origin/gh/kwen2501/213/orig 2025-09-07T06:13:53.0210825Z * [new branch] gh/kwen2501/214/base -> origin/gh/kwen2501/214/base 2025-09-07T06:13:53.0212277Z * [new branch] gh/kwen2501/214/head -> origin/gh/kwen2501/214/head 2025-09-07T06:13:53.0213439Z * [new branch] gh/kwen2501/214/orig -> origin/gh/kwen2501/214/orig 2025-09-07T06:13:53.0215062Z * [new branch] gh/kwen2501/215/base -> origin/gh/kwen2501/215/base 2025-09-07T06:13:53.0216211Z * [new branch] gh/kwen2501/215/head -> origin/gh/kwen2501/215/head 2025-09-07T06:13:53.0217363Z * [new branch] gh/kwen2501/215/orig -> origin/gh/kwen2501/215/orig 2025-09-07T06:13:53.0218883Z * [new branch] gh/kwen2501/216/base -> origin/gh/kwen2501/216/base 2025-09-07T06:13:53.0220030Z * [new branch] gh/kwen2501/216/head -> origin/gh/kwen2501/216/head 2025-09-07T06:13:53.0221208Z * [new branch] gh/kwen2501/216/orig -> origin/gh/kwen2501/216/orig 2025-09-07T06:13:53.0222735Z * [new branch] gh/kwen2501/217/base -> origin/gh/kwen2501/217/base 2025-09-07T06:13:53.0223873Z * [new branch] gh/kwen2501/217/head -> origin/gh/kwen2501/217/head 2025-09-07T06:13:53.0225108Z * [new branch] gh/kwen2501/217/orig -> origin/gh/kwen2501/217/orig 2025-09-07T06:13:53.0226661Z * [new branch] gh/kwen2501/218/base -> origin/gh/kwen2501/218/base 2025-09-07T06:13:53.0227794Z * [new branch] gh/kwen2501/218/head -> origin/gh/kwen2501/218/head 2025-09-07T06:13:53.0228871Z * [new branch] gh/kwen2501/218/orig -> origin/gh/kwen2501/218/orig 2025-09-07T06:13:53.0230461Z * [new branch] gh/kwen2501/219/base -> origin/gh/kwen2501/219/base 2025-09-07T06:13:53.0231481Z * [new branch] gh/kwen2501/219/head -> origin/gh/kwen2501/219/head 2025-09-07T06:13:53.0232609Z * [new branch] gh/kwen2501/219/orig -> origin/gh/kwen2501/219/orig 2025-09-07T06:13:53.0234301Z * [new branch] gh/kwen2501/220/base -> origin/gh/kwen2501/220/base 2025-09-07T06:13:53.0235398Z * [new branch] gh/kwen2501/220/head -> origin/gh/kwen2501/220/head 2025-09-07T06:13:53.0236606Z * [new branch] gh/kwen2501/220/orig -> origin/gh/kwen2501/220/orig 2025-09-07T06:13:53.0238198Z * [new branch] gh/kwen2501/221/base -> origin/gh/kwen2501/221/base 2025-09-07T06:13:53.0239260Z * [new branch] gh/kwen2501/221/head -> origin/gh/kwen2501/221/head 2025-09-07T06:13:53.0240358Z * [new branch] gh/kwen2501/221/orig -> origin/gh/kwen2501/221/orig 2025-09-07T06:13:53.0241908Z * [new branch] gh/kwen2501/222/base -> origin/gh/kwen2501/222/base 2025-09-07T06:13:53.0243114Z * [new branch] gh/kwen2501/222/head -> origin/gh/kwen2501/222/head 2025-09-07T06:13:53.0244120Z * [new branch] gh/kwen2501/222/orig -> origin/gh/kwen2501/222/orig 2025-09-07T06:13:53.0246075Z * [new branch] gh/kwen2501/223/base -> origin/gh/kwen2501/223/base 2025-09-07T06:13:53.0247166Z * [new branch] gh/kwen2501/223/head -> origin/gh/kwen2501/223/head 2025-09-07T06:13:53.0248328Z * [new branch] gh/kwen2501/223/orig -> origin/gh/kwen2501/223/orig 2025-09-07T06:13:53.0250405Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-09-07T06:13:53.0251601Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-09-07T06:13:53.0252863Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-09-07T06:13:53.0254505Z * [new branch] gh/kwen2501/225/base -> origin/gh/kwen2501/225/base 2025-09-07T06:13:53.0255593Z * [new branch] gh/kwen2501/225/head -> origin/gh/kwen2501/225/head 2025-09-07T06:13:53.0256741Z * [new branch] gh/kwen2501/225/orig -> origin/gh/kwen2501/225/orig 2025-09-07T06:13:53.0258270Z * [new branch] gh/kwen2501/226/base -> origin/gh/kwen2501/226/base 2025-09-07T06:13:53.0259385Z * [new branch] gh/kwen2501/226/head -> origin/gh/kwen2501/226/head 2025-09-07T06:13:53.0260592Z * [new branch] gh/kwen2501/226/orig -> origin/gh/kwen2501/226/orig 2025-09-07T06:13:53.0262189Z * [new branch] gh/kwen2501/227/base -> origin/gh/kwen2501/227/base 2025-09-07T06:13:53.0263425Z * [new branch] gh/kwen2501/227/head -> origin/gh/kwen2501/227/head 2025-09-07T06:13:53.0264572Z * [new branch] gh/kwen2501/227/orig -> origin/gh/kwen2501/227/orig 2025-09-07T06:13:53.0266140Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-09-07T06:13:53.0267216Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-09-07T06:13:53.0268360Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-09-07T06:13:53.0269906Z * [new branch] gh/kwen2501/229/base -> origin/gh/kwen2501/229/base 2025-09-07T06:13:53.0271031Z * [new branch] gh/kwen2501/229/head -> origin/gh/kwen2501/229/head 2025-09-07T06:13:53.0272137Z * [new branch] gh/kwen2501/229/orig -> origin/gh/kwen2501/229/orig 2025-09-07T06:13:53.0273683Z * [new branch] gh/kwen2501/230/base -> origin/gh/kwen2501/230/base 2025-09-07T06:13:53.0274807Z * [new branch] gh/kwen2501/230/head -> origin/gh/kwen2501/230/head 2025-09-07T06:13:53.0276076Z * [new branch] gh/kwen2501/230/orig -> origin/gh/kwen2501/230/orig 2025-09-07T06:13:53.0277530Z * [new branch] gh/kwen2501/231/base -> origin/gh/kwen2501/231/base 2025-09-07T06:13:53.0278696Z * [new branch] gh/kwen2501/231/head -> origin/gh/kwen2501/231/head 2025-09-07T06:13:53.0279843Z * [new branch] gh/kwen2501/231/orig -> origin/gh/kwen2501/231/orig 2025-09-07T06:13:53.0281382Z * [new branch] gh/kwen2501/232/base -> origin/gh/kwen2501/232/base 2025-09-07T06:13:53.0282507Z * [new branch] gh/kwen2501/232/head -> origin/gh/kwen2501/232/head 2025-09-07T06:13:53.0283643Z * [new branch] gh/kwen2501/232/orig -> origin/gh/kwen2501/232/orig 2025-09-07T06:13:53.0285644Z * [new branch] gh/laithsakka/156/base -> origin/gh/laithsakka/156/base 2025-09-07T06:13:53.0286771Z * [new branch] gh/laithsakka/156/head -> origin/gh/laithsakka/156/head 2025-09-07T06:13:53.0287877Z * [new branch] gh/laithsakka/156/orig -> origin/gh/laithsakka/156/orig 2025-09-07T06:13:53.0289585Z * [new branch] gh/laithsakka/160/base -> origin/gh/laithsakka/160/base 2025-09-07T06:13:53.0290650Z * [new branch] gh/laithsakka/160/head -> origin/gh/laithsakka/160/head 2025-09-07T06:13:53.0291933Z * [new branch] gh/laithsakka/160/orig -> origin/gh/laithsakka/160/orig 2025-09-07T06:13:53.0293736Z * [new branch] gh/laithsakka/178/base -> origin/gh/laithsakka/178/base 2025-09-07T06:13:53.0294942Z * [new branch] gh/laithsakka/178/head -> origin/gh/laithsakka/178/head 2025-09-07T06:13:53.0296086Z * [new branch] gh/laithsakka/178/orig -> origin/gh/laithsakka/178/orig 2025-09-07T06:13:53.0297659Z * [new branch] gh/laithsakka/191/base -> origin/gh/laithsakka/191/base 2025-09-07T06:13:53.0298844Z * [new branch] gh/laithsakka/191/head -> origin/gh/laithsakka/191/head 2025-09-07T06:13:53.0300485Z * [new branch] gh/laithsakka/191/orig -> origin/gh/laithsakka/191/orig 2025-09-07T06:13:53.0302197Z * [new branch] gh/laithsakka/237/base -> origin/gh/laithsakka/237/base 2025-09-07T06:13:53.0303316Z * [new branch] gh/laithsakka/237/head -> origin/gh/laithsakka/237/head 2025-09-07T06:13:53.0304579Z * [new branch] gh/laithsakka/237/orig -> origin/gh/laithsakka/237/orig 2025-09-07T06:13:53.0306183Z * [new branch] gh/laithsakka/249/base -> origin/gh/laithsakka/249/base 2025-09-07T06:13:53.0307260Z * [new branch] gh/laithsakka/249/head -> origin/gh/laithsakka/249/head 2025-09-07T06:13:53.0308393Z * [new branch] gh/laithsakka/249/orig -> origin/gh/laithsakka/249/orig 2025-09-07T06:13:53.0309922Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-09-07T06:13:53.0311058Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-09-07T06:13:53.0312130Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-09-07T06:13:53.0313799Z * [new branch] gh/laithsakka/254/base -> origin/gh/laithsakka/254/base 2025-09-07T06:13:53.0314837Z * [new branch] gh/laithsakka/254/head -> origin/gh/laithsakka/254/head 2025-09-07T06:13:53.0316089Z * [new branch] gh/laithsakka/254/orig -> origin/gh/laithsakka/254/orig 2025-09-07T06:13:53.0317703Z * [new branch] gh/laithsakka/255/base -> origin/gh/laithsakka/255/base 2025-09-07T06:13:53.0318731Z * [new branch] gh/laithsakka/255/head -> origin/gh/laithsakka/255/head 2025-09-07T06:13:53.0319786Z * [new branch] gh/laithsakka/255/orig -> origin/gh/laithsakka/255/orig 2025-09-07T06:13:53.0321234Z * [new branch] gh/laithsakka/256/base -> origin/gh/laithsakka/256/base 2025-09-07T06:13:53.0322475Z * [new branch] gh/laithsakka/256/head -> origin/gh/laithsakka/256/head 2025-09-07T06:13:53.0323404Z * [new branch] gh/laithsakka/256/orig -> origin/gh/laithsakka/256/orig 2025-09-07T06:13:53.0325020Z * [new branch] gh/laithsakka/257/base -> origin/gh/laithsakka/257/base 2025-09-07T06:13:53.0326140Z * [new branch] gh/laithsakka/257/head -> origin/gh/laithsakka/257/head 2025-09-07T06:13:53.0327245Z * [new branch] gh/laithsakka/257/orig -> origin/gh/laithsakka/257/orig 2025-09-07T06:13:53.0328812Z * [new branch] gh/laithsakka/258/base -> origin/gh/laithsakka/258/base 2025-09-07T06:13:53.0329927Z * [new branch] gh/laithsakka/258/head -> origin/gh/laithsakka/258/head 2025-09-07T06:13:53.0331026Z * [new branch] gh/laithsakka/258/orig -> origin/gh/laithsakka/258/orig 2025-09-07T06:13:53.0332972Z * [new branch] gh/laithsakka/259/base -> origin/gh/laithsakka/259/base 2025-09-07T06:13:53.0334128Z * [new branch] gh/laithsakka/259/head -> origin/gh/laithsakka/259/head 2025-09-07T06:13:53.0335280Z * [new branch] gh/laithsakka/259/orig -> origin/gh/laithsakka/259/orig 2025-09-07T06:13:53.0336816Z * [new branch] gh/laithsakka/260/base -> origin/gh/laithsakka/260/base 2025-09-07T06:13:53.0337997Z * [new branch] gh/laithsakka/260/head -> origin/gh/laithsakka/260/head 2025-09-07T06:13:53.0339162Z * [new branch] gh/laithsakka/260/orig -> origin/gh/laithsakka/260/orig 2025-09-07T06:13:53.0340708Z * [new branch] gh/laithsakka/261/base -> origin/gh/laithsakka/261/base 2025-09-07T06:13:53.0341876Z * [new branch] gh/laithsakka/261/head -> origin/gh/laithsakka/261/head 2025-09-07T06:13:53.0343037Z * [new branch] gh/laithsakka/261/orig -> origin/gh/laithsakka/261/orig 2025-09-07T06:13:53.0345142Z * [new branch] gh/laithsakka/262/base -> origin/gh/laithsakka/262/base 2025-09-07T06:13:53.0346695Z * [new branch] gh/laithsakka/262/head -> origin/gh/laithsakka/262/head 2025-09-07T06:13:53.0347855Z * [new branch] gh/laithsakka/262/orig -> origin/gh/laithsakka/262/orig 2025-09-07T06:13:53.0350310Z * [new branch] gh/laithsakka/263/base -> origin/gh/laithsakka/263/base 2025-09-07T06:13:53.0351493Z * [new branch] gh/laithsakka/263/head -> origin/gh/laithsakka/263/head 2025-09-07T06:13:53.0352637Z * [new branch] gh/laithsakka/263/orig -> origin/gh/laithsakka/263/orig 2025-09-07T06:13:53.0354129Z * [new branch] gh/laithsakka/264/base -> origin/gh/laithsakka/264/base 2025-09-07T06:13:53.0355321Z * [new branch] gh/laithsakka/264/head -> origin/gh/laithsakka/264/head 2025-09-07T06:13:53.0356485Z * [new branch] gh/laithsakka/264/orig -> origin/gh/laithsakka/264/orig 2025-09-07T06:13:53.0358254Z * [new branch] gh/laithsakka/265/base -> origin/gh/laithsakka/265/base 2025-09-07T06:13:53.0359328Z * [new branch] gh/laithsakka/265/head -> origin/gh/laithsakka/265/head 2025-09-07T06:13:53.0360501Z * [new branch] gh/laithsakka/265/orig -> origin/gh/laithsakka/265/orig 2025-09-07T06:13:53.0362193Z * [new branch] gh/laithsakka/266/base -> origin/gh/laithsakka/266/base 2025-09-07T06:13:53.0363298Z * [new branch] gh/laithsakka/266/head -> origin/gh/laithsakka/266/head 2025-09-07T06:13:53.0364389Z * [new branch] gh/laithsakka/266/orig -> origin/gh/laithsakka/266/orig 2025-09-07T06:13:53.0365934Z * [new branch] gh/laithsakka/267/base -> origin/gh/laithsakka/267/base 2025-09-07T06:13:53.0367072Z * [new branch] gh/laithsakka/267/head -> origin/gh/laithsakka/267/head 2025-09-07T06:13:53.0368344Z * [new branch] gh/laithsakka/267/orig -> origin/gh/laithsakka/267/orig 2025-09-07T06:13:53.0369929Z * [new branch] gh/laithsakka/268/base -> origin/gh/laithsakka/268/base 2025-09-07T06:13:53.0371024Z * [new branch] gh/laithsakka/268/head -> origin/gh/laithsakka/268/head 2025-09-07T06:13:53.0372481Z * [new branch] gh/laithsakka/268/orig -> origin/gh/laithsakka/268/orig 2025-09-07T06:13:53.0374205Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-09-07T06:13:53.0375651Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-09-07T06:13:53.0377089Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-09-07T06:13:53.0378254Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-09-07T06:13:53.0380272Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-09-07T06:13:53.0381394Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-09-07T06:13:53.0382843Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-09-07T06:13:53.0383842Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-09-07T06:13:53.0388022Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-09-07T06:13:53.0389138Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-09-07T06:13:53.0390800Z * [new branch] gh/lucaskabela/10/base -> origin/gh/lucaskabela/10/base 2025-09-07T06:13:53.0391902Z * [new branch] gh/lucaskabela/10/head -> origin/gh/lucaskabela/10/head 2025-09-07T06:13:53.0393032Z * [new branch] gh/lucaskabela/10/orig -> origin/gh/lucaskabela/10/orig 2025-09-07T06:13:53.0394434Z * [new branch] gh/lucaskabela/11/base -> origin/gh/lucaskabela/11/base 2025-09-07T06:13:53.0395533Z * [new branch] gh/lucaskabela/11/head -> origin/gh/lucaskabela/11/head 2025-09-07T06:13:53.0396689Z * [new branch] gh/lucaskabela/11/orig -> origin/gh/lucaskabela/11/orig 2025-09-07T06:13:53.0398077Z * [new branch] gh/lucaskabela/12/base -> origin/gh/lucaskabela/12/base 2025-09-07T06:13:53.0399178Z * [new branch] gh/lucaskabela/12/head -> origin/gh/lucaskabela/12/head 2025-09-07T06:13:53.0400316Z * [new branch] gh/lucaskabela/12/orig -> origin/gh/lucaskabela/12/orig 2025-09-07T06:13:53.0401696Z * [new branch] gh/lucaskabela/13/base -> origin/gh/lucaskabela/13/base 2025-09-07T06:13:53.0402799Z * [new branch] gh/lucaskabela/13/head -> origin/gh/lucaskabela/13/head 2025-09-07T06:13:53.0403966Z * [new branch] gh/lucaskabela/13/orig -> origin/gh/lucaskabela/13/orig 2025-09-07T06:13:53.0405382Z * [new branch] gh/lucaskabela/14/base -> origin/gh/lucaskabela/14/base 2025-09-07T06:13:53.0406486Z * [new branch] gh/lucaskabela/14/head -> origin/gh/lucaskabela/14/head 2025-09-07T06:13:53.0407642Z * [new branch] gh/lucaskabela/14/orig -> origin/gh/lucaskabela/14/orig 2025-09-07T06:13:53.0409071Z * [new branch] gh/lucaskabela/15/base -> origin/gh/lucaskabela/15/base 2025-09-07T06:13:53.0410159Z * [new branch] gh/lucaskabela/15/head -> origin/gh/lucaskabela/15/head 2025-09-07T06:13:53.0411398Z * [new branch] gh/lucaskabela/15/orig -> origin/gh/lucaskabela/15/orig 2025-09-07T06:13:53.0413132Z * [new branch] gh/lucaskabela/16/base -> origin/gh/lucaskabela/16/base 2025-09-07T06:13:53.0414274Z * [new branch] gh/lucaskabela/16/head -> origin/gh/lucaskabela/16/head 2025-09-07T06:13:53.0415436Z * [new branch] gh/lucaskabela/16/orig -> origin/gh/lucaskabela/16/orig 2025-09-07T06:13:53.0417011Z * [new branch] gh/lucaskabela/17/base -> origin/gh/lucaskabela/17/base 2025-09-07T06:13:53.0417886Z * [new branch] gh/lucaskabela/17/head -> origin/gh/lucaskabela/17/head 2025-09-07T06:13:53.0419089Z * [new branch] gh/lucaskabela/17/orig -> origin/gh/lucaskabela/17/orig 2025-09-07T06:13:53.0420665Z * [new branch] gh/lucaskabela/2/base -> origin/gh/lucaskabela/2/base 2025-09-07T06:13:53.0422359Z * [new branch] gh/lucaskabela/2/head -> origin/gh/lucaskabela/2/head 2025-09-07T06:13:53.0423508Z * [new branch] gh/lucaskabela/2/orig -> origin/gh/lucaskabela/2/orig 2025-09-07T06:13:53.0425210Z * [new branch] gh/lucaskabela/3/base -> origin/gh/lucaskabela/3/base 2025-09-07T06:13:53.0426364Z * [new branch] gh/lucaskabela/3/head -> origin/gh/lucaskabela/3/head 2025-09-07T06:13:53.0427503Z * [new branch] gh/lucaskabela/3/orig -> origin/gh/lucaskabela/3/orig 2025-09-07T06:13:53.0428915Z * [new branch] gh/lucaskabela/4/base -> origin/gh/lucaskabela/4/base 2025-09-07T06:13:53.0430027Z * [new branch] gh/lucaskabela/4/head -> origin/gh/lucaskabela/4/head 2025-09-07T06:13:53.0431127Z * [new branch] gh/lucaskabela/4/orig -> origin/gh/lucaskabela/4/orig 2025-09-07T06:13:53.0432786Z * [new branch] gh/lucaskabela/5/base -> origin/gh/lucaskabela/5/base 2025-09-07T06:13:53.0433847Z * [new branch] gh/lucaskabela/5/head -> origin/gh/lucaskabela/5/head 2025-09-07T06:13:53.0434972Z * [new branch] gh/lucaskabela/5/orig -> origin/gh/lucaskabela/5/orig 2025-09-07T06:13:53.0436397Z * [new branch] gh/lucaskabela/6/base -> origin/gh/lucaskabela/6/base 2025-09-07T06:13:53.0437521Z * [new branch] gh/lucaskabela/6/head -> origin/gh/lucaskabela/6/head 2025-09-07T06:13:53.0438774Z * [new branch] gh/lucaskabela/6/orig -> origin/gh/lucaskabela/6/orig 2025-09-07T06:13:53.0440354Z * [new branch] gh/lucaskabela/7/base -> origin/gh/lucaskabela/7/base 2025-09-07T06:13:53.0441439Z * [new branch] gh/lucaskabela/7/head -> origin/gh/lucaskabela/7/head 2025-09-07T06:13:53.0442593Z * [new branch] gh/lucaskabela/7/orig -> origin/gh/lucaskabela/7/orig 2025-09-07T06:13:53.0444041Z * [new branch] gh/lucaskabela/8/base -> origin/gh/lucaskabela/8/base 2025-09-07T06:13:53.0445274Z * [new branch] gh/lucaskabela/8/head -> origin/gh/lucaskabela/8/head 2025-09-07T06:13:53.0446427Z * [new branch] gh/lucaskabela/8/orig -> origin/gh/lucaskabela/8/orig 2025-09-07T06:13:53.0447934Z * [new branch] gh/lucaskabela/9/base -> origin/gh/lucaskabela/9/base 2025-09-07T06:13:53.0449235Z * [new branch] gh/lucaskabela/9/head -> origin/gh/lucaskabela/9/head 2025-09-07T06:13:53.0450724Z * [new branch] gh/lucaskabela/9/orig -> origin/gh/lucaskabela/9/orig 2025-09-07T06:13:53.0452695Z * [new branch] gh/lw/3/base -> origin/gh/lw/3/base 2025-09-07T06:13:53.0453835Z * [new branch] gh/lw/3/head -> origin/gh/lw/3/head 2025-09-07T06:13:53.0455106Z * [new branch] gh/lw/3/orig -> origin/gh/lw/3/orig 2025-09-07T06:13:53.0456856Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-09-07T06:13:53.0458454Z * [new branch] gh/malfet/330/base -> origin/gh/malfet/330/base 2025-09-07T06:13:53.0459738Z * [new branch] gh/malfet/330/head -> origin/gh/malfet/330/head 2025-09-07T06:13:53.0460799Z * [new branch] gh/malfet/330/orig -> origin/gh/malfet/330/orig 2025-09-07T06:13:53.0462391Z * [new branch] gh/malfet/396/base -> origin/gh/malfet/396/base 2025-09-07T06:13:53.0463822Z * [new branch] gh/malfet/396/head -> origin/gh/malfet/396/head 2025-09-07T06:13:53.0464937Z * [new branch] gh/malfet/396/orig -> origin/gh/malfet/396/orig 2025-09-07T06:13:53.0466921Z * [new branch] gh/malfet/397/base -> origin/gh/malfet/397/base 2025-09-07T06:13:53.0468100Z * [new branch] gh/malfet/397/head -> origin/gh/malfet/397/head 2025-09-07T06:13:53.0469248Z * [new branch] gh/malfet/397/orig -> origin/gh/malfet/397/orig 2025-09-07T06:13:53.0470750Z * [new branch] gh/malfet/398/base -> origin/gh/malfet/398/base 2025-09-07T06:13:53.0471829Z * [new branch] gh/malfet/398/head -> origin/gh/malfet/398/head 2025-09-07T06:13:53.0472998Z * [new branch] gh/malfet/398/orig -> origin/gh/malfet/398/orig 2025-09-07T06:13:53.0474495Z * [new branch] gh/malfet/399/base -> origin/gh/malfet/399/base 2025-09-07T06:13:53.0475633Z * [new branch] gh/malfet/399/head -> origin/gh/malfet/399/head 2025-09-07T06:13:53.0476721Z * [new branch] gh/malfet/399/orig -> origin/gh/malfet/399/orig 2025-09-07T06:13:53.0478258Z * [new branch] gh/malfet/414/base -> origin/gh/malfet/414/base 2025-09-07T06:13:53.0479438Z * [new branch] gh/malfet/414/head -> origin/gh/malfet/414/head 2025-09-07T06:13:53.0480518Z * [new branch] gh/malfet/414/orig -> origin/gh/malfet/414/orig 2025-09-07T06:13:53.0482030Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-09-07T06:13:53.0483133Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-09-07T06:13:53.0484273Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-09-07T06:13:53.0485709Z * [new branch] gh/malfet/418/base -> origin/gh/malfet/418/base 2025-09-07T06:13:53.0486876Z * [new branch] gh/malfet/418/head -> origin/gh/malfet/418/head 2025-09-07T06:13:53.0487980Z * [new branch] gh/malfet/418/orig -> origin/gh/malfet/418/orig 2025-09-07T06:13:53.0489575Z * [new branch] gh/malfet/475/base -> origin/gh/malfet/475/base 2025-09-07T06:13:53.0490759Z * [new branch] gh/malfet/475/head -> origin/gh/malfet/475/head 2025-09-07T06:13:53.0492199Z * [new branch] gh/malfet/475/orig -> origin/gh/malfet/475/orig 2025-09-07T06:13:53.0493852Z * [new branch] gh/malfet/476/base -> origin/gh/malfet/476/base 2025-09-07T06:13:53.0495022Z * [new branch] gh/malfet/476/head -> origin/gh/malfet/476/head 2025-09-07T06:13:53.0496198Z * [new branch] gh/malfet/476/orig -> origin/gh/malfet/476/orig 2025-09-07T06:13:53.0497647Z * [new branch] gh/malfet/477/base -> origin/gh/malfet/477/base 2025-09-07T06:13:53.0498793Z * [new branch] gh/malfet/477/head -> origin/gh/malfet/477/head 2025-09-07T06:13:53.0500049Z * [new branch] gh/malfet/477/orig -> origin/gh/malfet/477/orig 2025-09-07T06:13:53.0501460Z * [new branch] gh/malfet/478/base -> origin/gh/malfet/478/base 2025-09-07T06:13:53.0502665Z * [new branch] gh/malfet/478/head -> origin/gh/malfet/478/head 2025-09-07T06:13:53.0503817Z * [new branch] gh/malfet/478/orig -> origin/gh/malfet/478/orig 2025-09-07T06:13:53.0505365Z * [new branch] gh/malfet/479/base -> origin/gh/malfet/479/base 2025-09-07T06:13:53.0506685Z * [new branch] gh/malfet/479/head -> origin/gh/malfet/479/head 2025-09-07T06:13:53.0507890Z * [new branch] gh/malfet/479/orig -> origin/gh/malfet/479/orig 2025-09-07T06:13:53.0509434Z * [new branch] gh/malfet/480/base -> origin/gh/malfet/480/base 2025-09-07T06:13:53.0512548Z * [new branch] gh/malfet/480/head -> origin/gh/malfet/480/head 2025-09-07T06:13:53.0513846Z * [new branch] gh/malfet/480/orig -> origin/gh/malfet/480/orig 2025-09-07T06:13:53.0514082Z * [new branch] gh/malfet/481/base -> origin/gh/malfet/481/base 2025-09-07T06:13:53.0514326Z * [new branch] gh/malfet/481/head -> origin/gh/malfet/481/head 2025-09-07T06:13:53.0515504Z * [new branch] gh/malfet/481/orig -> origin/gh/malfet/481/orig 2025-09-07T06:13:53.0516887Z * [new branch] gh/malfet/482/base -> origin/gh/malfet/482/base 2025-09-07T06:13:53.0518030Z * [new branch] gh/malfet/482/head -> origin/gh/malfet/482/head 2025-09-07T06:13:53.0519196Z * [new branch] gh/malfet/482/orig -> origin/gh/malfet/482/orig 2025-09-07T06:13:53.0521119Z * [new branch] gh/malfet/483/base -> origin/gh/malfet/483/base 2025-09-07T06:13:53.0522276Z * [new branch] gh/malfet/483/head -> origin/gh/malfet/483/head 2025-09-07T06:13:53.0523433Z * [new branch] gh/malfet/483/orig -> origin/gh/malfet/483/orig 2025-09-07T06:13:53.0525013Z * [new branch] gh/malfet/484/base -> origin/gh/malfet/484/base 2025-09-07T06:13:53.0526150Z * [new branch] gh/malfet/484/head -> origin/gh/malfet/484/head 2025-09-07T06:13:53.0527296Z * [new branch] gh/malfet/484/orig -> origin/gh/malfet/484/orig 2025-09-07T06:13:53.0528876Z * [new branch] gh/malfet/485/base -> origin/gh/malfet/485/base 2025-09-07T06:13:53.0529997Z * [new branch] gh/malfet/485/head -> origin/gh/malfet/485/head 2025-09-07T06:13:53.0531203Z * [new branch] gh/malfet/485/orig -> origin/gh/malfet/485/orig 2025-09-07T06:13:53.0533086Z * [new branch] gh/malfet/486/base -> origin/gh/malfet/486/base 2025-09-07T06:13:53.0534209Z * [new branch] gh/malfet/486/head -> origin/gh/malfet/486/head 2025-09-07T06:13:53.0535375Z * [new branch] gh/malfet/486/orig -> origin/gh/malfet/486/orig 2025-09-07T06:13:53.0536952Z * [new branch] gh/malfet/487/base -> origin/gh/malfet/487/base 2025-09-07T06:13:53.0538089Z * [new branch] gh/malfet/487/head -> origin/gh/malfet/487/head 2025-09-07T06:13:53.0539247Z * [new branch] gh/malfet/487/orig -> origin/gh/malfet/487/orig 2025-09-07T06:13:53.0540943Z * [new branch] gh/malfet/488/base -> origin/gh/malfet/488/base 2025-09-07T06:13:53.0542150Z * [new branch] gh/malfet/488/head -> origin/gh/malfet/488/head 2025-09-07T06:13:53.0543305Z * [new branch] gh/malfet/488/orig -> origin/gh/malfet/488/orig 2025-09-07T06:13:53.0544949Z * [new branch] gh/malfet/489/base -> origin/gh/malfet/489/base 2025-09-07T06:13:53.0546040Z * [new branch] gh/malfet/489/head -> origin/gh/malfet/489/head 2025-09-07T06:13:53.0547353Z * [new branch] gh/malfet/489/orig -> origin/gh/malfet/489/orig 2025-09-07T06:13:53.0549257Z * [new branch] gh/malfet/490/base -> origin/gh/malfet/490/base 2025-09-07T06:13:53.0553441Z * [new branch] gh/malfet/490/head -> origin/gh/malfet/490/head 2025-09-07T06:13:53.0554704Z * [new branch] gh/malfet/490/orig -> origin/gh/malfet/490/orig 2025-09-07T06:13:53.0556412Z * [new branch] gh/malfet/491/base -> origin/gh/malfet/491/base 2025-09-07T06:13:53.0557682Z * [new branch] gh/malfet/491/head -> origin/gh/malfet/491/head 2025-09-07T06:13:53.0558888Z * [new branch] gh/malfet/491/orig -> origin/gh/malfet/491/orig 2025-09-07T06:13:53.0560499Z * [new branch] gh/malfet/492/base -> origin/gh/malfet/492/base 2025-09-07T06:13:53.0561706Z * [new branch] gh/malfet/492/head -> origin/gh/malfet/492/head 2025-09-07T06:13:53.0562862Z * [new branch] gh/malfet/492/orig -> origin/gh/malfet/492/orig 2025-09-07T06:13:53.0564510Z * [new branch] gh/malfet/493/base -> origin/gh/malfet/493/base 2025-09-07T06:13:53.0565536Z * [new branch] gh/malfet/493/head -> origin/gh/malfet/493/head 2025-09-07T06:13:53.0566697Z * [new branch] gh/malfet/493/orig -> origin/gh/malfet/493/orig 2025-09-07T06:13:53.0568140Z * [new branch] gh/malfet/494/base -> origin/gh/malfet/494/base 2025-09-07T06:13:53.0569261Z * [new branch] gh/malfet/494/head -> origin/gh/malfet/494/head 2025-09-07T06:13:53.0570423Z * [new branch] gh/malfet/494/orig -> origin/gh/malfet/494/orig 2025-09-07T06:13:53.0572722Z * [new branch] gh/malfet/495/base -> origin/gh/malfet/495/base 2025-09-07T06:13:53.0573903Z * [new branch] gh/malfet/495/head -> origin/gh/malfet/495/head 2025-09-07T06:13:53.0575070Z * [new branch] gh/malfet/495/orig -> origin/gh/malfet/495/orig 2025-09-07T06:13:53.0576881Z * [new branch] gh/malfet/496/base -> origin/gh/malfet/496/base 2025-09-07T06:13:53.0578040Z * [new branch] gh/malfet/496/head -> origin/gh/malfet/496/head 2025-09-07T06:13:53.0579195Z * [new branch] gh/malfet/496/orig -> origin/gh/malfet/496/orig 2025-09-07T06:13:53.0580784Z * [new branch] gh/malfet/497/base -> origin/gh/malfet/497/base 2025-09-07T06:13:53.0581936Z * [new branch] gh/malfet/497/head -> origin/gh/malfet/497/head 2025-09-07T06:13:53.0583257Z * [new branch] gh/malfet/497/orig -> origin/gh/malfet/497/orig 2025-09-07T06:13:53.0584960Z * [new branch] gh/malfet/498/base -> origin/gh/malfet/498/base 2025-09-07T06:13:53.0586064Z * [new branch] gh/malfet/498/head -> origin/gh/malfet/498/head 2025-09-07T06:13:53.0587201Z * [new branch] gh/malfet/498/orig -> origin/gh/malfet/498/orig 2025-09-07T06:13:53.0589059Z * [new branch] gh/malfet/499/base -> origin/gh/malfet/499/base 2025-09-07T06:13:53.0590181Z * [new branch] gh/malfet/499/head -> origin/gh/malfet/499/head 2025-09-07T06:13:53.0591324Z * [new branch] gh/malfet/499/orig -> origin/gh/malfet/499/orig 2025-09-07T06:13:53.0592906Z * [new branch] gh/malfet/500/base -> origin/gh/malfet/500/base 2025-09-07T06:13:53.0594020Z * [new branch] gh/malfet/500/head -> origin/gh/malfet/500/head 2025-09-07T06:13:53.0595126Z * [new branch] gh/malfet/500/orig -> origin/gh/malfet/500/orig 2025-09-07T06:13:53.0596756Z * [new branch] gh/malfet/501/base -> origin/gh/malfet/501/base 2025-09-07T06:13:53.0597956Z * [new branch] gh/malfet/501/head -> origin/gh/malfet/501/head 2025-09-07T06:13:53.0599061Z * [new branch] gh/malfet/501/orig -> origin/gh/malfet/501/orig 2025-09-07T06:13:53.0600730Z * [new branch] gh/malfet/502/base -> origin/gh/malfet/502/base 2025-09-07T06:13:53.0601841Z * [new branch] gh/malfet/502/head -> origin/gh/malfet/502/head 2025-09-07T06:13:53.0602973Z * [new branch] gh/malfet/502/orig -> origin/gh/malfet/502/orig 2025-09-07T06:13:53.0604482Z * [new branch] gh/malfet/503/base -> origin/gh/malfet/503/base 2025-09-07T06:13:53.0605607Z * [new branch] gh/malfet/503/head -> origin/gh/malfet/503/head 2025-09-07T06:13:53.0606684Z * [new branch] gh/malfet/503/orig -> origin/gh/malfet/503/orig 2025-09-07T06:13:53.0608303Z * [new branch] gh/malfet/504/base -> origin/gh/malfet/504/base 2025-09-07T06:13:53.0609340Z * [new branch] gh/malfet/504/head -> origin/gh/malfet/504/head 2025-09-07T06:13:53.0610442Z * [new branch] gh/malfet/504/orig -> origin/gh/malfet/504/orig 2025-09-07T06:13:53.0612329Z * [new branch] gh/malfet/505/base -> origin/gh/malfet/505/base 2025-09-07T06:13:53.0613528Z * [new branch] gh/malfet/505/head -> origin/gh/malfet/505/head 2025-09-07T06:13:53.0614681Z * [new branch] gh/malfet/505/orig -> origin/gh/malfet/505/orig 2025-09-07T06:13:53.0616366Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-09-07T06:13:53.0617490Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-09-07T06:13:53.0618676Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-09-07T06:13:53.0620260Z * [new branch] gh/malfet/507/base -> origin/gh/malfet/507/base 2025-09-07T06:13:53.0621427Z * [new branch] gh/malfet/507/head -> origin/gh/malfet/507/head 2025-09-07T06:13:53.0622584Z * [new branch] gh/malfet/507/orig -> origin/gh/malfet/507/orig 2025-09-07T06:13:53.0624463Z * [new branch] gh/malfet/508/base -> origin/gh/malfet/508/base 2025-09-07T06:13:53.0625551Z * [new branch] gh/malfet/508/head -> origin/gh/malfet/508/head 2025-09-07T06:13:53.0626673Z * [new branch] gh/malfet/508/orig -> origin/gh/malfet/508/orig 2025-09-07T06:13:53.0628109Z * [new branch] gh/malfet/509/base -> origin/gh/malfet/509/base 2025-09-07T06:13:53.0629196Z * [new branch] gh/malfet/509/head -> origin/gh/malfet/509/head 2025-09-07T06:13:53.0630416Z * [new branch] gh/malfet/509/orig -> origin/gh/malfet/509/orig 2025-09-07T06:13:53.0632091Z * [new branch] gh/malfet/510/base -> origin/gh/malfet/510/base 2025-09-07T06:13:53.0633200Z * [new branch] gh/malfet/510/head -> origin/gh/malfet/510/head 2025-09-07T06:13:53.0634376Z * [new branch] gh/malfet/510/orig -> origin/gh/malfet/510/orig 2025-09-07T06:13:53.0635892Z * [new branch] gh/malfet/511/base -> origin/gh/malfet/511/base 2025-09-07T06:13:53.0637015Z * [new branch] gh/malfet/511/head -> origin/gh/malfet/511/head 2025-09-07T06:13:53.0638136Z * [new branch] gh/malfet/511/orig -> origin/gh/malfet/511/orig 2025-09-07T06:13:53.0639642Z * [new branch] gh/malfet/512/base -> origin/gh/malfet/512/base 2025-09-07T06:13:53.0640751Z * [new branch] gh/malfet/512/head -> origin/gh/malfet/512/head 2025-09-07T06:13:53.0641872Z * [new branch] gh/malfet/512/orig -> origin/gh/malfet/512/orig 2025-09-07T06:13:53.0643518Z * [new branch] gh/malfet/513/base -> origin/gh/malfet/513/base 2025-09-07T06:13:53.0644608Z * [new branch] gh/malfet/513/head -> origin/gh/malfet/513/head 2025-09-07T06:13:53.0645743Z * [new branch] gh/malfet/513/orig -> origin/gh/malfet/513/orig 2025-09-07T06:13:53.0647368Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-09-07T06:13:53.0648450Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-09-07T06:13:53.0651617Z * [new branch] gh/manuelcandales/10/base -> origin/gh/manuelcandales/10/base 2025-09-07T06:13:53.0652845Z * [new branch] gh/manuelcandales/10/head -> origin/gh/manuelcandales/10/head 2025-09-07T06:13:53.0654003Z * [new branch] gh/manuelcandales/10/orig -> origin/gh/manuelcandales/10/orig 2025-09-07T06:13:53.0655767Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-09-07T06:13:53.0656714Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-09-07T06:13:53.0657974Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-09-07T06:13:53.0659508Z * [new branch] gh/manuelcandales/9/base -> origin/gh/manuelcandales/9/base 2025-09-07T06:13:53.0660713Z * [new branch] gh/manuelcandales/9/head -> origin/gh/manuelcandales/9/head 2025-09-07T06:13:53.0661926Z * [new branch] gh/manuelcandales/9/orig -> origin/gh/manuelcandales/9/orig 2025-09-07T06:13:53.0664148Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-09-07T06:13:53.0666224Z * [new branch] gh/masnesral/204/base -> origin/gh/masnesral/204/base 2025-09-07T06:13:53.0667469Z * [new branch] gh/masnesral/204/head -> origin/gh/masnesral/204/head 2025-09-07T06:13:53.0668665Z * [new branch] gh/masnesral/204/orig -> origin/gh/masnesral/204/orig 2025-09-07T06:13:53.0670279Z * [new branch] gh/masnesral/235/base -> origin/gh/masnesral/235/base 2025-09-07T06:13:53.0671931Z * [new branch] gh/masnesral/235/head -> origin/gh/masnesral/235/head 2025-09-07T06:13:53.0673228Z * [new branch] gh/masnesral/235/orig -> origin/gh/masnesral/235/orig 2025-09-07T06:13:53.0674677Z * [new branch] gh/masnesral/34/base -> origin/gh/masnesral/34/base 2025-09-07T06:13:53.0676626Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-09-07T06:13:53.0677820Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-09-07T06:13:53.0679242Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-09-07T06:13:53.0680329Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-09-07T06:13:53.0681697Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-09-07T06:13:53.0682848Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-09-07T06:13:53.0684262Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-09-07T06:13:53.0685302Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-09-07T06:13:53.0686690Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-09-07T06:13:53.0687738Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-09-07T06:13:53.0689140Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-09-07T06:13:53.0690262Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-09-07T06:13:53.0691875Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-09-07T06:13:53.0693401Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-09-07T06:13:53.0695053Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-09-07T06:13:53.0696201Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-09-07T06:13:53.0697700Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-09-07T06:13:53.0698772Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-09-07T06:13:53.0700329Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-09-07T06:13:53.0701348Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-09-07T06:13:53.0702867Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-09-07T06:13:53.0704249Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-09-07T06:13:53.0705715Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-09-07T06:13:53.0706855Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-09-07T06:13:53.0708384Z * [new branch] gh/mikaylagawarecki/317/base -> origin/gh/mikaylagawarecki/317/base 2025-09-07T06:13:53.0709560Z * [new branch] gh/mikaylagawarecki/317/head -> origin/gh/mikaylagawarecki/317/head 2025-09-07T06:13:53.0710719Z * [new branch] gh/mikaylagawarecki/317/orig -> origin/gh/mikaylagawarecki/317/orig 2025-09-07T06:13:53.0712354Z * [new branch] gh/mikaylagawarecki/320/base -> origin/gh/mikaylagawarecki/320/base 2025-09-07T06:13:53.0713489Z * [new branch] gh/mikaylagawarecki/320/head -> origin/gh/mikaylagawarecki/320/head 2025-09-07T06:13:53.0714779Z * [new branch] gh/mikaylagawarecki/320/orig -> origin/gh/mikaylagawarecki/320/orig 2025-09-07T06:13:53.0716293Z * [new branch] gh/mikaylagawarecki/329/base -> origin/gh/mikaylagawarecki/329/base 2025-09-07T06:13:53.0717499Z * [new branch] gh/mikaylagawarecki/329/head -> origin/gh/mikaylagawarecki/329/head 2025-09-07T06:13:53.0718628Z * [new branch] gh/mikaylagawarecki/329/orig -> origin/gh/mikaylagawarecki/329/orig 2025-09-07T06:13:53.0720283Z * [new branch] gh/mikaylagawarecki/330/base -> origin/gh/mikaylagawarecki/330/base 2025-09-07T06:13:53.0721382Z * [new branch] gh/mikaylagawarecki/330/head -> origin/gh/mikaylagawarecki/330/head 2025-09-07T06:13:53.0722487Z * [new branch] gh/mikaylagawarecki/330/orig -> origin/gh/mikaylagawarecki/330/orig 2025-09-07T06:13:53.0724130Z * [new branch] gh/mikaylagawarecki/331/base -> origin/gh/mikaylagawarecki/331/base 2025-09-07T06:13:53.0725240Z * [new branch] gh/mikaylagawarecki/331/head -> origin/gh/mikaylagawarecki/331/head 2025-09-07T06:13:53.0726403Z * [new branch] gh/mikaylagawarecki/331/orig -> origin/gh/mikaylagawarecki/331/orig 2025-09-07T06:13:53.0728193Z * [new branch] gh/mikaylagawarecki/332/base -> origin/gh/mikaylagawarecki/332/base 2025-09-07T06:13:53.0729285Z * [new branch] gh/mikaylagawarecki/332/head -> origin/gh/mikaylagawarecki/332/head 2025-09-07T06:13:53.0730407Z * [new branch] gh/mikaylagawarecki/332/orig -> origin/gh/mikaylagawarecki/332/orig 2025-09-07T06:13:53.0732278Z * [new branch] gh/mikaylagawarecki/334/base -> origin/gh/mikaylagawarecki/334/base 2025-09-07T06:13:53.0733401Z * [new branch] gh/mikaylagawarecki/334/head -> origin/gh/mikaylagawarecki/334/head 2025-09-07T06:13:53.0734581Z * [new branch] gh/mikaylagawarecki/334/orig -> origin/gh/mikaylagawarecki/334/orig 2025-09-07T06:13:53.0736275Z * [new branch] gh/mikaylagawarecki/335/base -> origin/gh/mikaylagawarecki/335/base 2025-09-07T06:13:53.0737529Z * [new branch] gh/mikaylagawarecki/335/head -> origin/gh/mikaylagawarecki/335/head 2025-09-07T06:13:53.0738746Z * [new branch] gh/mikaylagawarecki/335/orig -> origin/gh/mikaylagawarecki/335/orig 2025-09-07T06:13:53.0740327Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-09-07T06:13:53.0741498Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-09-07T06:13:53.0742650Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-09-07T06:13:53.0744155Z * [new branch] gh/mikaylagawarecki/337/base -> origin/gh/mikaylagawarecki/337/base 2025-09-07T06:13:53.0745277Z * [new branch] gh/mikaylagawarecki/337/head -> origin/gh/mikaylagawarecki/337/head 2025-09-07T06:13:53.0746453Z * [new branch] gh/mikaylagawarecki/337/orig -> origin/gh/mikaylagawarecki/337/orig 2025-09-07T06:13:53.0747926Z * [new branch] gh/mikaylagawarecki/338/base -> origin/gh/mikaylagawarecki/338/base 2025-09-07T06:13:53.0749420Z * [new branch] gh/mikaylagawarecki/338/head -> origin/gh/mikaylagawarecki/338/head 2025-09-07T06:13:53.0750706Z * [new branch] gh/mikaylagawarecki/338/orig -> origin/gh/mikaylagawarecki/338/orig 2025-09-07T06:13:53.0752672Z * [new branch] gh/mikaylagawarecki/339/base -> origin/gh/mikaylagawarecki/339/base 2025-09-07T06:13:53.0753902Z * [new branch] gh/mikaylagawarecki/339/head -> origin/gh/mikaylagawarecki/339/head 2025-09-07T06:13:53.0755064Z * [new branch] gh/mikaylagawarecki/339/orig -> origin/gh/mikaylagawarecki/339/orig 2025-09-07T06:13:53.0756955Z * [new branch] gh/mlazos/1/base -> origin/gh/mlazos/1/base 2025-09-07T06:13:53.0758186Z * [new branch] gh/mlazos/1/head -> origin/gh/mlazos/1/head 2025-09-07T06:13:53.0759372Z * [new branch] gh/mlazos/1/orig -> origin/gh/mlazos/1/orig 2025-09-07T06:13:53.0760996Z * [new branch] gh/mlazos/12/base -> origin/gh/mlazos/12/base 2025-09-07T06:13:53.0762211Z * [new branch] gh/mlazos/12/head -> origin/gh/mlazos/12/head 2025-09-07T06:13:53.0763322Z * [new branch] gh/mlazos/12/orig -> origin/gh/mlazos/12/orig 2025-09-07T06:13:53.0764947Z * [new branch] gh/mlazos/13/base -> origin/gh/mlazos/13/base 2025-09-07T06:13:53.0766150Z * [new branch] gh/mlazos/13/head -> origin/gh/mlazos/13/head 2025-09-07T06:13:53.0767352Z * [new branch] gh/mlazos/13/orig -> origin/gh/mlazos/13/orig 2025-09-07T06:13:53.0768903Z * [new branch] gh/mlazos/14/base -> origin/gh/mlazos/14/base 2025-09-07T06:13:53.0770010Z * [new branch] gh/mlazos/14/head -> origin/gh/mlazos/14/head 2025-09-07T06:13:53.0771136Z * [new branch] gh/mlazos/14/orig -> origin/gh/mlazos/14/orig 2025-09-07T06:13:53.0773095Z * [new branch] gh/mlazos/15/base -> origin/gh/mlazos/15/base 2025-09-07T06:13:53.0774222Z * [new branch] gh/mlazos/15/head -> origin/gh/mlazos/15/head 2025-09-07T06:13:53.0775389Z * [new branch] gh/mlazos/15/orig -> origin/gh/mlazos/15/orig 2025-09-07T06:13:53.0777175Z * [new branch] gh/mlazos/16/base -> origin/gh/mlazos/16/base 2025-09-07T06:13:53.0778341Z * [new branch] gh/mlazos/16/head -> origin/gh/mlazos/16/head 2025-09-07T06:13:53.0779502Z * [new branch] gh/mlazos/16/orig -> origin/gh/mlazos/16/orig 2025-09-07T06:13:53.0780989Z * [new branch] gh/mlazos/17/base -> origin/gh/mlazos/17/base 2025-09-07T06:13:53.0782091Z * [new branch] gh/mlazos/17/head -> origin/gh/mlazos/17/head 2025-09-07T06:13:53.0783454Z * [new branch] gh/mlazos/17/orig -> origin/gh/mlazos/17/orig 2025-09-07T06:13:53.0785167Z * [new branch] gh/mlazos/2/base -> origin/gh/mlazos/2/base 2025-09-07T06:13:53.0786236Z * [new branch] gh/mlazos/2/head -> origin/gh/mlazos/2/head 2025-09-07T06:13:53.0787320Z * [new branch] gh/mlazos/2/orig -> origin/gh/mlazos/2/orig 2025-09-07T06:13:53.0788961Z * [new branch] gh/mlazos/3/base -> origin/gh/mlazos/3/base 2025-09-07T06:13:53.0790017Z * [new branch] gh/mlazos/3/head -> origin/gh/mlazos/3/head 2025-09-07T06:13:53.0791127Z * [new branch] gh/mlazos/3/orig -> origin/gh/mlazos/3/orig 2025-09-07T06:13:53.0792970Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-09-07T06:13:53.0794309Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-09-07T06:13:53.0796586Z * [new branch] gh/muchulee8/62/base -> origin/gh/muchulee8/62/base 2025-09-07T06:13:53.0797942Z * [new branch] gh/muchulee8/62/head -> origin/gh/muchulee8/62/head 2025-09-07T06:13:53.0799095Z * [new branch] gh/muchulee8/62/orig -> origin/gh/muchulee8/62/orig 2025-09-07T06:13:53.0800617Z * [new branch] gh/muchulee8/63/base -> origin/gh/muchulee8/63/base 2025-09-07T06:13:53.0801757Z * [new branch] gh/muchulee8/63/head -> origin/gh/muchulee8/63/head 2025-09-07T06:13:53.0802963Z * [new branch] gh/muchulee8/63/orig -> origin/gh/muchulee8/63/orig 2025-09-07T06:13:53.0804692Z * [new branch] gh/muchulee8/64/base -> origin/gh/muchulee8/64/base 2025-09-07T06:13:53.0805767Z * [new branch] gh/muchulee8/64/head -> origin/gh/muchulee8/64/head 2025-09-07T06:13:53.0806936Z * [new branch] gh/muchulee8/64/orig -> origin/gh/muchulee8/64/orig 2025-09-07T06:13:53.0808557Z * [new branch] gh/muchulee8/65/base -> origin/gh/muchulee8/65/base 2025-09-07T06:13:53.0809614Z * [new branch] gh/muchulee8/65/head -> origin/gh/muchulee8/65/head 2025-09-07T06:13:53.0810836Z * [new branch] gh/muchulee8/65/orig -> origin/gh/muchulee8/65/orig 2025-09-07T06:13:53.0813262Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-09-07T06:13:53.0814475Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-09-07T06:13:53.0815810Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-09-07T06:13:53.0817323Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-09-07T06:13:53.0818483Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-09-07T06:13:53.0819675Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-09-07T06:13:53.0821189Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-09-07T06:13:53.0822333Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-09-07T06:13:53.0823526Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-09-07T06:13:53.0825057Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-09-07T06:13:53.0826239Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-09-07T06:13:53.0827491Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-09-07T06:13:53.0829109Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-09-07T06:13:53.0830253Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-09-07T06:13:53.0831451Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-09-07T06:13:53.0832978Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-09-07T06:13:53.0834088Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-09-07T06:13:53.0835142Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-09-07T06:13:53.0836888Z * [new branch] gh/oulgen/35/base -> origin/gh/oulgen/35/base 2025-09-07T06:13:53.0838020Z * [new branch] gh/oulgen/35/head -> origin/gh/oulgen/35/head 2025-09-07T06:13:53.0839118Z * [new branch] gh/oulgen/35/orig -> origin/gh/oulgen/35/orig 2025-09-07T06:13:53.0840690Z * [new branch] gh/oulgen/48/base -> origin/gh/oulgen/48/base 2025-09-07T06:13:53.0841862Z * [new branch] gh/oulgen/48/head -> origin/gh/oulgen/48/head 2025-09-07T06:13:53.0842943Z * [new branch] gh/oulgen/48/orig -> origin/gh/oulgen/48/orig 2025-09-07T06:13:53.0844413Z * [new branch] gh/oulgen/49/base -> origin/gh/oulgen/49/base 2025-09-07T06:13:53.0845561Z * [new branch] gh/oulgen/49/head -> origin/gh/oulgen/49/head 2025-09-07T06:13:53.0846728Z * [new branch] gh/oulgen/49/orig -> origin/gh/oulgen/49/orig 2025-09-07T06:13:53.0848996Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-09-07T06:13:53.0852344Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-09-07T06:13:53.0853580Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-09-07T06:13:53.0855430Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-09-07T06:13:53.0856596Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-09-07T06:13:53.0857808Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-09-07T06:13:53.0859386Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-09-07T06:13:53.0860559Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-09-07T06:13:53.0862308Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-09-07T06:13:53.0864076Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-09-07T06:13:53.0865163Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-09-07T06:13:53.0866366Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-09-07T06:13:53.0867830Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-09-07T06:13:53.0868952Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-09-07T06:13:53.0870081Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-09-07T06:13:53.0871608Z * [new branch] gh/pearu/113/base -> origin/gh/pearu/113/base 2025-09-07T06:13:53.0872699Z * [new branch] gh/pearu/113/head -> origin/gh/pearu/113/head 2025-09-07T06:13:53.0873810Z * [new branch] gh/pearu/113/orig -> origin/gh/pearu/113/orig 2025-09-07T06:13:53.0875320Z * [new branch] gh/pearu/114/base -> origin/gh/pearu/114/base 2025-09-07T06:13:53.0876483Z * [new branch] gh/pearu/114/head -> origin/gh/pearu/114/head 2025-09-07T06:13:53.0877820Z * [new branch] gh/pearu/114/orig -> origin/gh/pearu/114/orig 2025-09-07T06:13:53.0879341Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-09-07T06:13:53.0880490Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-09-07T06:13:53.0881541Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-09-07T06:13:53.0883079Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-09-07T06:13:53.0884150Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-09-07T06:13:53.0885303Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-09-07T06:13:53.0886790Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-09-07T06:13:53.0887803Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-09-07T06:13:53.0888878Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-09-07T06:13:53.0890831Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-09-07T06:13:53.0892636Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-09-07T06:13:53.0893852Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-09-07T06:13:53.0895627Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-09-07T06:13:53.0896745Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-09-07T06:13:53.0897917Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-09-07T06:13:53.0899782Z * [new branch] gh/qqaatw/29/base -> origin/gh/qqaatw/29/base 2025-09-07T06:13:53.0900979Z * [new branch] gh/qqaatw/29/head -> origin/gh/qqaatw/29/head 2025-09-07T06:13:53.0902080Z * [new branch] gh/qqaatw/29/orig -> origin/gh/qqaatw/29/orig 2025-09-07T06:13:53.0903725Z * [new branch] gh/raymo/refresh-script -> origin/gh/raymo/refresh-script 2025-09-07T06:13:53.0905652Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-09-07T06:13:53.0906783Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-09-07T06:13:53.0908345Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-09-07T06:13:53.0909451Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-09-07T06:13:53.0910570Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-09-07T06:13:53.0912097Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-09-07T06:13:53.0913251Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-09-07T06:13:53.0914350Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-09-07T06:13:53.0915834Z * [new branch] gh/rec/156/base -> origin/gh/rec/156/base 2025-09-07T06:13:53.0916958Z * [new branch] gh/rec/156/head -> origin/gh/rec/156/head 2025-09-07T06:13:53.0918072Z * [new branch] gh/rec/156/orig -> origin/gh/rec/156/orig 2025-09-07T06:13:53.0919570Z * [new branch] gh/rec/160/base -> origin/gh/rec/160/base 2025-09-07T06:13:53.0920680Z * [new branch] gh/rec/160/head -> origin/gh/rec/160/head 2025-09-07T06:13:53.0921921Z * [new branch] gh/rec/160/orig -> origin/gh/rec/160/orig 2025-09-07T06:13:53.0923539Z * [new branch] gh/rec/162/base -> origin/gh/rec/162/base 2025-09-07T06:13:53.0924645Z * [new branch] gh/rec/162/head -> origin/gh/rec/162/head 2025-09-07T06:13:53.0925751Z * [new branch] gh/rec/162/orig -> origin/gh/rec/162/orig 2025-09-07T06:13:53.0927283Z * [new branch] gh/rec/163/base -> origin/gh/rec/163/base 2025-09-07T06:13:53.0928439Z * [new branch] gh/rec/163/head -> origin/gh/rec/163/head 2025-09-07T06:13:53.0929547Z * [new branch] gh/rec/163/orig -> origin/gh/rec/163/orig 2025-09-07T06:13:53.0931030Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-09-07T06:13:53.0932450Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-09-07T06:13:53.0933576Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-09-07T06:13:53.0935174Z * [new branch] gh/rec/165/base -> origin/gh/rec/165/base 2025-09-07T06:13:53.0936310Z * [new branch] gh/rec/165/head -> origin/gh/rec/165/head 2025-09-07T06:13:53.0937472Z * [new branch] gh/rec/165/orig -> origin/gh/rec/165/orig 2025-09-07T06:13:53.0939130Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-09-07T06:13:53.0940327Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-09-07T06:13:53.0941424Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-09-07T06:13:53.0943346Z * [new branch] gh/robert-hardwick/1/base -> origin/gh/robert-hardwick/1/base 2025-09-07T06:13:53.0944591Z * [new branch] gh/robert-hardwick/1/head -> origin/gh/robert-hardwick/1/head 2025-09-07T06:13:53.0945754Z * [new branch] gh/robert-hardwick/1/orig -> origin/gh/robert-hardwick/1/orig 2025-09-07T06:13:53.0947279Z * [new branch] gh/robert-hardwick/2/base -> origin/gh/robert-hardwick/2/base 2025-09-07T06:13:53.0948397Z * [new branch] gh/robert-hardwick/2/head -> origin/gh/robert-hardwick/2/head 2025-09-07T06:13:53.0950068Z * [new branch] gh/robert-hardwick/2/orig -> origin/gh/robert-hardwick/2/orig 2025-09-07T06:13:53.0951682Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-09-07T06:13:53.0952849Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-09-07T06:13:53.0954098Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-09-07T06:13:53.0955664Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-09-07T06:13:53.0956781Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-09-07T06:13:53.0957940Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-09-07T06:13:53.0959758Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-09-07T06:13:53.0960950Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-09-07T06:13:53.0962673Z * [new branch] gh/rtimpe/10/base -> origin/gh/rtimpe/10/base 2025-09-07T06:13:53.0963833Z * [new branch] gh/rtimpe/10/head -> origin/gh/rtimpe/10/head 2025-09-07T06:13:53.0964931Z * [new branch] gh/rtimpe/10/orig -> origin/gh/rtimpe/10/orig 2025-09-07T06:13:53.0966511Z * [new branch] gh/rtimpe/11/base -> origin/gh/rtimpe/11/base 2025-09-07T06:13:53.0967690Z * [new branch] gh/rtimpe/11/head -> origin/gh/rtimpe/11/head 2025-09-07T06:13:53.0968927Z * [new branch] gh/rtimpe/11/orig -> origin/gh/rtimpe/11/orig 2025-09-07T06:13:53.0970388Z * [new branch] gh/rtimpe/12/base -> origin/gh/rtimpe/12/base 2025-09-07T06:13:53.0971559Z * [new branch] gh/rtimpe/12/head -> origin/gh/rtimpe/12/head 2025-09-07T06:13:53.0972980Z * [new branch] gh/rtimpe/12/orig -> origin/gh/rtimpe/12/orig 2025-09-07T06:13:53.0974492Z * [new branch] gh/rtimpe/13/base -> origin/gh/rtimpe/13/base 2025-09-07T06:13:53.0975660Z * [new branch] gh/rtimpe/13/head -> origin/gh/rtimpe/13/head 2025-09-07T06:13:53.0976830Z * [new branch] gh/rtimpe/13/orig -> origin/gh/rtimpe/13/orig 2025-09-07T06:13:53.0978327Z * [new branch] gh/rtimpe/14/base -> origin/gh/rtimpe/14/base 2025-09-07T06:13:53.0979517Z * [new branch] gh/rtimpe/14/head -> origin/gh/rtimpe/14/head 2025-09-07T06:13:53.0980633Z * [new branch] gh/rtimpe/14/orig -> origin/gh/rtimpe/14/orig 2025-09-07T06:13:53.0982164Z * [new branch] gh/rtimpe/15/base -> origin/gh/rtimpe/15/base 2025-09-07T06:13:53.0983306Z * [new branch] gh/rtimpe/15/head -> origin/gh/rtimpe/15/head 2025-09-07T06:13:53.0984698Z * [new branch] gh/rtimpe/15/orig -> origin/gh/rtimpe/15/orig 2025-09-07T06:13:53.0986154Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-09-07T06:13:53.0987324Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-09-07T06:13:53.0988653Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-09-07T06:13:53.0989808Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-09-07T06:13:53.0991315Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-09-07T06:13:53.0992415Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-09-07T06:13:53.0993978Z * [new branch] gh/rtimpe/9/base -> origin/gh/rtimpe/9/base 2025-09-07T06:13:53.0995109Z * [new branch] gh/rtimpe/9/head -> origin/gh/rtimpe/9/head 2025-09-07T06:13:53.0996234Z * [new branch] gh/rtimpe/9/orig -> origin/gh/rtimpe/9/orig 2025-09-07T06:13:53.0998222Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-09-07T06:13:53.0999378Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-09-07T06:13:53.1000486Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-09-07T06:13:53.1001970Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-09-07T06:13:53.1003139Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-09-07T06:13:53.1004284Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-09-07T06:13:53.1005825Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-09-07T06:13:53.1006984Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-09-07T06:13:53.1008092Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-09-07T06:13:53.1009616Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-09-07T06:13:53.1010704Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-09-07T06:13:53.1012077Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-09-07T06:13:53.1013785Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-09-07T06:13:53.1015488Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-09-07T06:13:53.1016662Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-09-07T06:13:53.1018180Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-09-07T06:13:53.1019316Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-09-07T06:13:53.1020471Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-09-07T06:13:53.1022022Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-09-07T06:13:53.1023181Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-09-07T06:13:53.1024479Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-09-07T06:13:53.1026303Z * [new branch] gh/sarckk/2/base -> origin/gh/sarckk/2/base 2025-09-07T06:13:53.1027418Z * [new branch] gh/sarckk/2/head -> origin/gh/sarckk/2/head 2025-09-07T06:13:53.1028564Z * [new branch] gh/sarckk/2/orig -> origin/gh/sarckk/2/orig 2025-09-07T06:13:53.1030549Z * [new branch] gh/seemethere/35/base -> origin/gh/seemethere/35/base 2025-09-07T06:13:53.1031712Z * [new branch] gh/seemethere/35/head -> origin/gh/seemethere/35/head 2025-09-07T06:13:53.1032881Z * [new branch] gh/seemethere/35/orig -> origin/gh/seemethere/35/orig 2025-09-07T06:13:53.1034438Z * [new branch] gh/seemethere/37/base -> origin/gh/seemethere/37/base 2025-09-07T06:13:53.1035508Z * [new branch] gh/seemethere/37/head -> origin/gh/seemethere/37/head 2025-09-07T06:13:53.1036663Z * [new branch] gh/seemethere/37/orig -> origin/gh/seemethere/37/orig 2025-09-07T06:13:53.1038130Z * [new branch] gh/seemethere/43/base -> origin/gh/seemethere/43/base 2025-09-07T06:13:53.1039242Z * [new branch] gh/seemethere/43/head -> origin/gh/seemethere/43/head 2025-09-07T06:13:53.1040408Z * [new branch] gh/seemethere/43/orig -> origin/gh/seemethere/43/orig 2025-09-07T06:13:53.1041945Z * [new branch] gh/seemethere/44/base -> origin/gh/seemethere/44/base 2025-09-07T06:13:53.1043069Z * [new branch] gh/seemethere/44/head -> origin/gh/seemethere/44/head 2025-09-07T06:13:53.1044331Z * [new branch] gh/seemethere/44/orig -> origin/gh/seemethere/44/orig 2025-09-07T06:13:53.1045838Z * [new branch] gh/seemethere/48/base -> origin/gh/seemethere/48/base 2025-09-07T06:13:53.1046970Z * [new branch] gh/seemethere/48/head -> origin/gh/seemethere/48/head 2025-09-07T06:13:53.1048014Z * [new branch] gh/seemethere/48/orig -> origin/gh/seemethere/48/orig 2025-09-07T06:13:53.1050081Z * [new branch] gh/seemethere/49/base -> origin/gh/seemethere/49/base 2025-09-07T06:13:53.1051213Z * [new branch] gh/seemethere/49/head -> origin/gh/seemethere/49/head 2025-09-07T06:13:53.1052534Z * [new branch] gh/seemethere/49/orig -> origin/gh/seemethere/49/orig 2025-09-07T06:13:53.1054101Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-09-07T06:13:53.1055307Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-09-07T06:13:53.1056454Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-09-07T06:13:53.1058220Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-09-07T06:13:53.1059385Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-09-07T06:13:53.1060630Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-09-07T06:13:53.1062210Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-09-07T06:13:53.1063482Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-09-07T06:13:53.1064708Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-09-07T06:13:53.1066153Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-09-07T06:13:53.1067153Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-09-07T06:13:53.1068342Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-09-07T06:13:53.1069864Z * [new branch] gh/seemethere/56/base -> origin/gh/seemethere/56/base 2025-09-07T06:13:53.1070975Z * [new branch] gh/seemethere/56/head -> origin/gh/seemethere/56/head 2025-09-07T06:13:53.1072106Z * [new branch] gh/seemethere/56/orig -> origin/gh/seemethere/56/orig 2025-09-07T06:13:53.1073625Z * [new branch] gh/seemethere/57/base -> origin/gh/seemethere/57/base 2025-09-07T06:13:53.1074762Z * [new branch] gh/seemethere/57/head -> origin/gh/seemethere/57/head 2025-09-07T06:13:53.1075973Z * [new branch] gh/seemethere/57/orig -> origin/gh/seemethere/57/orig 2025-09-07T06:13:53.1077483Z * [new branch] gh/seemethere/58/base -> origin/gh/seemethere/58/base 2025-09-07T06:13:53.1078566Z * [new branch] gh/seemethere/58/head -> origin/gh/seemethere/58/head 2025-09-07T06:13:53.1079783Z * [new branch] gh/seemethere/58/orig -> origin/gh/seemethere/58/orig 2025-09-07T06:13:53.1081128Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-09-07T06:13:53.1082231Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-09-07T06:13:53.1083422Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-09-07T06:13:53.1084903Z * [new branch] gh/seemethere/60/base -> origin/gh/seemethere/60/base 2025-09-07T06:13:53.1086052Z * [new branch] gh/seemethere/60/head -> origin/gh/seemethere/60/head 2025-09-07T06:13:53.1087192Z * [new branch] gh/seemethere/60/orig -> origin/gh/seemethere/60/orig 2025-09-07T06:13:53.1088655Z * [new branch] gh/seemethere/61/base -> origin/gh/seemethere/61/base 2025-09-07T06:13:53.1089763Z * [new branch] gh/seemethere/61/head -> origin/gh/seemethere/61/head 2025-09-07T06:13:53.1091001Z * [new branch] gh/seemethere/61/orig -> origin/gh/seemethere/61/orig 2025-09-07T06:13:53.1092933Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-09-07T06:13:53.1094154Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-09-07T06:13:53.1095242Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-09-07T06:13:53.1096751Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-09-07T06:13:53.1097894Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-09-07T06:13:53.1099105Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-09-07T06:13:53.1101218Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-09-07T06:13:53.1102523Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-09-07T06:13:53.1103788Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-09-07T06:13:53.1105695Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-09-07T06:13:53.1107073Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-09-07T06:13:53.1108284Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-09-07T06:13:53.1109822Z * [new branch] gh/shunting314/211/base -> origin/gh/shunting314/211/base 2025-09-07T06:13:53.1110945Z * [new branch] gh/shunting314/211/head -> origin/gh/shunting314/211/head 2025-09-07T06:13:53.1112042Z * [new branch] gh/shunting314/211/orig -> origin/gh/shunting314/211/orig 2025-09-07T06:13:53.1113495Z * [new branch] gh/shunting314/212/base -> origin/gh/shunting314/212/base 2025-09-07T06:13:53.1114706Z * [new branch] gh/shunting314/212/head -> origin/gh/shunting314/212/head 2025-09-07T06:13:53.1115826Z * [new branch] gh/shunting314/212/orig -> origin/gh/shunting314/212/orig 2025-09-07T06:13:53.1117755Z * [new branch] gh/shunting314/213/base -> origin/gh/shunting314/213/base 2025-09-07T06:13:53.1118938Z * [new branch] gh/shunting314/213/head -> origin/gh/shunting314/213/head 2025-09-07T06:13:53.1120109Z * [new branch] gh/shunting314/213/orig -> origin/gh/shunting314/213/orig 2025-09-07T06:13:53.1121693Z * [new branch] gh/shunting314/214/base -> origin/gh/shunting314/214/base 2025-09-07T06:13:53.1122888Z * [new branch] gh/shunting314/214/head -> origin/gh/shunting314/214/head 2025-09-07T06:13:53.1124011Z * [new branch] gh/shunting314/214/orig -> origin/gh/shunting314/214/orig 2025-09-07T06:13:53.1125903Z * [new branch] gh/shunting314/215/base -> origin/gh/shunting314/215/base 2025-09-07T06:13:53.1126986Z * [new branch] gh/shunting314/215/head -> origin/gh/shunting314/215/head 2025-09-07T06:13:53.1128128Z * [new branch] gh/shunting314/215/orig -> origin/gh/shunting314/215/orig 2025-09-07T06:13:53.1129618Z * [new branch] gh/shunting314/216/base -> origin/gh/shunting314/216/base 2025-09-07T06:13:53.1130689Z * [new branch] gh/shunting314/216/head -> origin/gh/shunting314/216/head 2025-09-07T06:13:53.1131997Z * [new branch] gh/shunting314/216/orig -> origin/gh/shunting314/216/orig 2025-09-07T06:13:53.1133767Z * [new branch] gh/shunting314/217/base -> origin/gh/shunting314/217/base 2025-09-07T06:13:53.1134940Z * [new branch] gh/shunting314/217/head -> origin/gh/shunting314/217/head 2025-09-07T06:13:53.1136088Z * [new branch] gh/shunting314/217/orig -> origin/gh/shunting314/217/orig 2025-09-07T06:13:53.1137799Z * [new branch] gh/shunting314/218/base -> origin/gh/shunting314/218/base 2025-09-07T06:13:53.1139017Z * [new branch] gh/shunting314/218/head -> origin/gh/shunting314/218/head 2025-09-07T06:13:53.1140132Z * [new branch] gh/shunting314/218/orig -> origin/gh/shunting314/218/orig 2025-09-07T06:13:53.1141602Z * [new branch] gh/shunting314/219/base -> origin/gh/shunting314/219/base 2025-09-07T06:13:53.1142736Z * [new branch] gh/shunting314/219/head -> origin/gh/shunting314/219/head 2025-09-07T06:13:53.1144010Z * [new branch] gh/shunting314/219/orig -> origin/gh/shunting314/219/orig 2025-09-07T06:13:53.1145744Z * [new branch] gh/shunting314/220/base -> origin/gh/shunting314/220/base 2025-09-07T06:13:53.1147044Z * [new branch] gh/shunting314/220/head -> origin/gh/shunting314/220/head 2025-09-07T06:13:53.1148243Z * [new branch] gh/shunting314/220/orig -> origin/gh/shunting314/220/orig 2025-09-07T06:13:53.1150166Z * [new branch] gh/shunting314/221/base -> origin/gh/shunting314/221/base 2025-09-07T06:13:53.1151248Z * [new branch] gh/shunting314/221/head -> origin/gh/shunting314/221/head 2025-09-07T06:13:53.1152398Z * [new branch] gh/shunting314/221/orig -> origin/gh/shunting314/221/orig 2025-09-07T06:13:53.1153880Z * [new branch] gh/shunting314/222/base -> origin/gh/shunting314/222/base 2025-09-07T06:13:53.1155079Z * [new branch] gh/shunting314/222/head -> origin/gh/shunting314/222/head 2025-09-07T06:13:53.1156364Z * [new branch] gh/shunting314/222/orig -> origin/gh/shunting314/222/orig 2025-09-07T06:13:53.1157781Z * [new branch] gh/shunting314/223/base -> origin/gh/shunting314/223/base 2025-09-07T06:13:53.1158925Z * [new branch] gh/shunting314/223/head -> origin/gh/shunting314/223/head 2025-09-07T06:13:53.1160053Z * [new branch] gh/shunting314/223/orig -> origin/gh/shunting314/223/orig 2025-09-07T06:13:53.1162109Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-09-07T06:13:53.1163241Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-09-07T06:13:53.1164587Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-09-07T06:13:53.1165624Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-09-07T06:13:53.1167010Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-09-07T06:13:53.1168127Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-09-07T06:13:53.1169629Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-09-07T06:13:53.1170722Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-09-07T06:13:53.1173069Z * [new branch] gh/sinhaanhsul/1/base -> origin/gh/sinhaanhsul/1/base 2025-09-07T06:13:53.1174139Z * [new branch] gh/sinhaanhsul/1/head -> origin/gh/sinhaanhsul/1/head 2025-09-07T06:13:53.1176131Z * [new branch] gh/skarjala/17/base -> origin/gh/skarjala/17/base 2025-09-07T06:13:53.1177260Z * [new branch] gh/skarjala/17/head -> origin/gh/skarjala/17/head 2025-09-07T06:13:53.1178439Z * [new branch] gh/skarjala/17/orig -> origin/gh/skarjala/17/orig 2025-09-07T06:13:53.1180059Z * [new branch] gh/skarjala/18/base -> origin/gh/skarjala/18/base 2025-09-07T06:13:53.1181228Z * [new branch] gh/skarjala/18/head -> origin/gh/skarjala/18/head 2025-09-07T06:13:53.1182374Z * [new branch] gh/skarjala/18/orig -> origin/gh/skarjala/18/orig 2025-09-07T06:13:53.1184140Z * [new branch] gh/skarjala/19/base -> origin/gh/skarjala/19/base 2025-09-07T06:13:53.1185310Z * [new branch] gh/skarjala/19/head -> origin/gh/skarjala/19/head 2025-09-07T06:13:53.1186464Z * [new branch] gh/skarjala/19/orig -> origin/gh/skarjala/19/orig 2025-09-07T06:13:53.1188282Z * [new branch] gh/slayton58/1/base -> origin/gh/slayton58/1/base 2025-09-07T06:13:53.1189400Z * [new branch] gh/slayton58/1/head -> origin/gh/slayton58/1/head 2025-09-07T06:13:53.1190533Z * [new branch] gh/slayton58/1/orig -> origin/gh/slayton58/1/orig 2025-09-07T06:13:53.1191963Z * [new branch] gh/slayton58/2/base -> origin/gh/slayton58/2/base 2025-09-07T06:13:53.1193136Z * [new branch] gh/slayton58/2/head -> origin/gh/slayton58/2/head 2025-09-07T06:13:53.1194369Z * [new branch] gh/slayton58/2/orig -> origin/gh/slayton58/2/orig 2025-09-07T06:13:53.1195895Z * [new branch] gh/slayton58/3/base -> origin/gh/slayton58/3/base 2025-09-07T06:13:53.1197000Z * [new branch] gh/slayton58/3/head -> origin/gh/slayton58/3/head 2025-09-07T06:13:53.1198319Z * [new branch] gh/slayton58/3/orig -> origin/gh/slayton58/3/orig 2025-09-07T06:13:53.1199703Z * [new branch] gh/slayton58/4/base -> origin/gh/slayton58/4/base 2025-09-07T06:13:53.1200790Z * [new branch] gh/slayton58/4/head -> origin/gh/slayton58/4/head 2025-09-07T06:13:53.1201915Z * [new branch] gh/slayton58/4/orig -> origin/gh/slayton58/4/orig 2025-09-07T06:13:53.1203384Z * [new branch] gh/slayton58/5/base -> origin/gh/slayton58/5/base 2025-09-07T06:13:53.1204515Z * [new branch] gh/slayton58/5/head -> origin/gh/slayton58/5/head 2025-09-07T06:13:53.1205637Z * [new branch] gh/slayton58/5/orig -> origin/gh/slayton58/5/orig 2025-09-07T06:13:53.1207683Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-09-07T06:13:53.1208789Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-09-07T06:13:53.1209928Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-09-07T06:13:53.1211655Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-09-07T06:13:53.1213210Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-09-07T06:13:53.1214370Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-09-07T06:13:53.1216209Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-09-07T06:13:53.1217348Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-09-07T06:13:53.1218550Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-09-07T06:13:53.1220326Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-09-07T06:13:53.1221455Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-09-07T06:13:53.1222619Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-09-07T06:13:53.1224327Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-09-07T06:13:53.1225511Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-09-07T06:13:53.1226656Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-09-07T06:13:53.1228219Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-09-07T06:13:53.1229514Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-09-07T06:13:53.1230619Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-09-07T06:13:53.1232314Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-09-07T06:13:53.1233523Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-09-07T06:13:53.1234658Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-09-07T06:13:53.1236210Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-09-07T06:13:53.1237358Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-09-07T06:13:53.1238511Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-09-07T06:13:53.1240019Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-09-07T06:13:53.1241137Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-09-07T06:13:53.1242249Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-09-07T06:13:53.1243864Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-09-07T06:13:53.1245030Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-09-07T06:13:53.1246174Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-09-07T06:13:53.1247829Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-09-07T06:13:53.1249018Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-09-07T06:13:53.1254080Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-09-07T06:13:53.1255799Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-09-07T06:13:53.1256911Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-09-07T06:13:53.1258051Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-09-07T06:13:53.1259878Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-09-07T06:13:53.1261022Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-09-07T06:13:53.1262216Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-09-07T06:13:53.1263785Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-09-07T06:13:53.1264989Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-09-07T06:13:53.1266227Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-09-07T06:13:53.1267754Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-09-07T06:13:53.1268879Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-09-07T06:13:53.1270122Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-09-07T06:13:53.1271531Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-09-07T06:13:53.1272794Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-09-07T06:13:53.1273994Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-09-07T06:13:53.1276188Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-09-07T06:13:53.1277370Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-09-07T06:13:53.1278469Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-09-07T06:13:53.1280544Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-09-07T06:13:53.1281838Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-09-07T06:13:53.1282979Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-09-07T06:13:53.1284554Z * [new branch] gh/soulitzer/362/base -> origin/gh/soulitzer/362/base 2025-09-07T06:13:53.1285644Z * [new branch] gh/soulitzer/362/head -> origin/gh/soulitzer/362/head 2025-09-07T06:13:53.1286796Z * [new branch] gh/soulitzer/362/orig -> origin/gh/soulitzer/362/orig 2025-09-07T06:13:53.1288342Z * [new branch] gh/soulitzer/372/base -> origin/gh/soulitzer/372/base 2025-09-07T06:13:53.1289466Z * [new branch] gh/soulitzer/372/head -> origin/gh/soulitzer/372/head 2025-09-07T06:13:53.1290571Z * [new branch] gh/soulitzer/372/orig -> origin/gh/soulitzer/372/orig 2025-09-07T06:13:53.1292499Z * [new branch] gh/soulitzer/373/base -> origin/gh/soulitzer/373/base 2025-09-07T06:13:53.1293686Z * [new branch] gh/soulitzer/373/head -> origin/gh/soulitzer/373/head 2025-09-07T06:13:53.1294860Z * [new branch] gh/soulitzer/373/orig -> origin/gh/soulitzer/373/orig 2025-09-07T06:13:53.1296463Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-09-07T06:13:53.1297689Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-09-07T06:13:53.1298882Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-09-07T06:13:53.1300484Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-09-07T06:13:53.1301545Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-09-07T06:13:53.1302698Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-09-07T06:13:53.1304391Z * [new branch] gh/soulitzer/376/base -> origin/gh/soulitzer/376/base 2025-09-07T06:13:53.1305538Z * [new branch] gh/soulitzer/376/head -> origin/gh/soulitzer/376/head 2025-09-07T06:13:53.1306661Z * [new branch] gh/soulitzer/376/orig -> origin/gh/soulitzer/376/orig 2025-09-07T06:13:53.1308248Z * [new branch] gh/soulitzer/377/base -> origin/gh/soulitzer/377/base 2025-09-07T06:13:53.1309297Z * [new branch] gh/soulitzer/377/head -> origin/gh/soulitzer/377/head 2025-09-07T06:13:53.1310577Z * [new branch] gh/soulitzer/377/orig -> origin/gh/soulitzer/377/orig 2025-09-07T06:13:53.1312181Z * [new branch] gh/soulitzer/378/base -> origin/gh/soulitzer/378/base 2025-09-07T06:13:53.1313393Z * [new branch] gh/soulitzer/378/head -> origin/gh/soulitzer/378/head 2025-09-07T06:13:53.1315017Z * [new branch] gh/soulitzer/378/orig -> origin/gh/soulitzer/378/orig 2025-09-07T06:13:53.1316638Z * [new branch] gh/soulitzer/379/base -> origin/gh/soulitzer/379/base 2025-09-07T06:13:53.1317715Z * [new branch] gh/soulitzer/379/head -> origin/gh/soulitzer/379/head 2025-09-07T06:13:53.1318818Z * [new branch] gh/soulitzer/379/orig -> origin/gh/soulitzer/379/orig 2025-09-07T06:13:53.1320627Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-09-07T06:13:53.1322546Z * [new branch] gh/swolchok/767/base -> origin/gh/swolchok/767/base 2025-09-07T06:13:53.1323935Z * [new branch] gh/swolchok/767/head -> origin/gh/swolchok/767/head 2025-09-07T06:13:53.1325299Z * [new branch] gh/swolchok/767/orig -> origin/gh/swolchok/767/orig 2025-09-07T06:13:53.1326955Z * [new branch] gh/swolchok/768/base -> origin/gh/swolchok/768/base 2025-09-07T06:13:53.1328182Z * [new branch] gh/swolchok/768/head -> origin/gh/swolchok/768/head 2025-09-07T06:13:53.1329537Z * [new branch] gh/swolchok/768/orig -> origin/gh/swolchok/768/orig 2025-09-07T06:13:53.1331286Z * [new branch] gh/swolchok/769/base -> origin/gh/swolchok/769/base 2025-09-07T06:13:53.1332858Z * [new branch] gh/swolchok/769/head -> origin/gh/swolchok/769/head 2025-09-07T06:13:53.1334205Z * [new branch] gh/swolchok/769/orig -> origin/gh/swolchok/769/orig 2025-09-07T06:13:53.1336007Z * [new branch] gh/swolchok/771/base -> origin/gh/swolchok/771/base 2025-09-07T06:13:53.1337296Z * [new branch] gh/swolchok/771/head -> origin/gh/swolchok/771/head 2025-09-07T06:13:53.1338499Z * [new branch] gh/swolchok/771/orig -> origin/gh/swolchok/771/orig 2025-09-07T06:13:53.1340098Z * [new branch] gh/swolchok/772/base -> origin/gh/swolchok/772/base 2025-09-07T06:13:53.1341322Z * [new branch] gh/swolchok/772/head -> origin/gh/swolchok/772/head 2025-09-07T06:13:53.1342541Z * [new branch] gh/swolchok/772/orig -> origin/gh/swolchok/772/orig 2025-09-07T06:13:53.1344370Z * [new branch] gh/swolchok/773/base -> origin/gh/swolchok/773/base 2025-09-07T06:13:53.1345570Z * [new branch] gh/swolchok/773/head -> origin/gh/swolchok/773/head 2025-09-07T06:13:53.1346948Z * [new branch] gh/swolchok/773/orig -> origin/gh/swolchok/773/orig 2025-09-07T06:13:53.1348437Z * [new branch] gh/swolchok/786/base -> origin/gh/swolchok/786/base 2025-09-07T06:13:53.1349932Z * [new branch] gh/swolchok/786/head -> origin/gh/swolchok/786/head 2025-09-07T06:13:53.1351152Z * [new branch] gh/swolchok/786/orig -> origin/gh/swolchok/786/orig 2025-09-07T06:13:53.1352614Z * [new branch] gh/swolchok/787/base -> origin/gh/swolchok/787/base 2025-09-07T06:13:53.1353761Z * [new branch] gh/swolchok/787/head -> origin/gh/swolchok/787/head 2025-09-07T06:13:53.1354956Z * [new branch] gh/swolchok/787/orig -> origin/gh/swolchok/787/orig 2025-09-07T06:13:53.1356520Z * [new branch] gh/swolchok/788/base -> origin/gh/swolchok/788/base 2025-09-07T06:13:53.1357664Z * [new branch] gh/swolchok/788/head -> origin/gh/swolchok/788/head 2025-09-07T06:13:53.1358831Z * [new branch] gh/swolchok/788/orig -> origin/gh/swolchok/788/orig 2025-09-07T06:13:53.1360346Z * [new branch] gh/swolchok/789/base -> origin/gh/swolchok/789/base 2025-09-07T06:13:53.1361576Z * [new branch] gh/swolchok/789/head -> origin/gh/swolchok/789/head 2025-09-07T06:13:53.1362907Z * [new branch] gh/swolchok/789/orig -> origin/gh/swolchok/789/orig 2025-09-07T06:13:53.1364393Z * [new branch] gh/swolchok/790/base -> origin/gh/swolchok/790/base 2025-09-07T06:13:53.1365608Z * [new branch] gh/swolchok/790/head -> origin/gh/swolchok/790/head 2025-09-07T06:13:53.1366538Z * [new branch] gh/swolchok/790/orig -> origin/gh/swolchok/790/orig 2025-09-07T06:13:53.1368217Z * [new branch] gh/swolchok/791/base -> origin/gh/swolchok/791/base 2025-09-07T06:13:53.1369291Z * [new branch] gh/swolchok/791/head -> origin/gh/swolchok/791/head 2025-09-07T06:13:53.1370482Z * [new branch] gh/swolchok/791/orig -> origin/gh/swolchok/791/orig 2025-09-07T06:13:53.1372336Z * [new branch] gh/swolchok/792/base -> origin/gh/swolchok/792/base 2025-09-07T06:13:53.1373468Z * [new branch] gh/swolchok/792/head -> origin/gh/swolchok/792/head 2025-09-07T06:13:53.1374616Z * [new branch] gh/swolchok/792/orig -> origin/gh/swolchok/792/orig 2025-09-07T06:13:53.1376225Z * [new branch] gh/swolchok/793/base -> origin/gh/swolchok/793/base 2025-09-07T06:13:53.1377358Z * [new branch] gh/swolchok/793/head -> origin/gh/swolchok/793/head 2025-09-07T06:13:53.1378632Z * [new branch] gh/swolchok/793/orig -> origin/gh/swolchok/793/orig 2025-09-07T06:13:53.1380763Z * [new branch] gh/swolchok/794/base -> origin/gh/swolchok/794/base 2025-09-07T06:13:53.1381970Z * [new branch] gh/swolchok/794/head -> origin/gh/swolchok/794/head 2025-09-07T06:13:53.1383020Z * [new branch] gh/swolchok/794/orig -> origin/gh/swolchok/794/orig 2025-09-07T06:13:53.1385285Z * [new branch] gh/swolchok/795/base -> origin/gh/swolchok/795/base 2025-09-07T06:13:53.1386491Z * [new branch] gh/swolchok/795/head -> origin/gh/swolchok/795/head 2025-09-07T06:13:53.1387623Z * [new branch] gh/swolchok/795/orig -> origin/gh/swolchok/795/orig 2025-09-07T06:13:53.1389215Z * [new branch] gh/swolchok/796/base -> origin/gh/swolchok/796/base 2025-09-07T06:13:53.1390429Z * [new branch] gh/swolchok/796/head -> origin/gh/swolchok/796/head 2025-09-07T06:13:53.1391622Z * [new branch] gh/swolchok/796/orig -> origin/gh/swolchok/796/orig 2025-09-07T06:13:53.1393356Z * [new branch] gh/swolchok/797/base -> origin/gh/swolchok/797/base 2025-09-07T06:13:53.1394544Z * [new branch] gh/swolchok/797/head -> origin/gh/swolchok/797/head 2025-09-07T06:13:53.1395799Z * [new branch] gh/swolchok/797/orig -> origin/gh/swolchok/797/orig 2025-09-07T06:13:53.1397479Z * [new branch] gh/swolchok/798/base -> origin/gh/swolchok/798/base 2025-09-07T06:13:53.1398524Z * [new branch] gh/swolchok/798/head -> origin/gh/swolchok/798/head 2025-09-07T06:13:53.1399755Z * [new branch] gh/swolchok/798/orig -> origin/gh/swolchok/798/orig 2025-09-07T06:13:53.1401496Z * [new branch] gh/swolchok/799/base -> origin/gh/swolchok/799/base 2025-09-07T06:13:53.1402568Z * [new branch] gh/swolchok/799/head -> origin/gh/swolchok/799/head 2025-09-07T06:13:53.1403869Z * [new branch] gh/swolchok/799/orig -> origin/gh/swolchok/799/orig 2025-09-07T06:13:53.1405747Z * [new branch] gh/swolchok/800/base -> origin/gh/swolchok/800/base 2025-09-07T06:13:53.1406778Z * [new branch] gh/swolchok/800/head -> origin/gh/swolchok/800/head 2025-09-07T06:13:53.1408046Z * [new branch] gh/swolchok/800/orig -> origin/gh/swolchok/800/orig 2025-09-07T06:13:53.1409719Z * [new branch] gh/swolchok/801/base -> origin/gh/swolchok/801/base 2025-09-07T06:13:53.1410801Z * [new branch] gh/swolchok/801/head -> origin/gh/swolchok/801/head 2025-09-07T06:13:53.1412468Z * [new branch] gh/swolchok/801/orig -> origin/gh/swolchok/801/orig 2025-09-07T06:13:53.1414171Z * [new branch] gh/swolchok/802/base -> origin/gh/swolchok/802/base 2025-09-07T06:13:53.1415119Z * [new branch] gh/swolchok/802/head -> origin/gh/swolchok/802/head 2025-09-07T06:13:53.1416439Z * [new branch] gh/swolchok/802/orig -> origin/gh/swolchok/802/orig 2025-09-07T06:13:53.1418055Z * [new branch] gh/swolchok/803/base -> origin/gh/swolchok/803/base 2025-09-07T06:13:53.1419204Z * [new branch] gh/swolchok/803/head -> origin/gh/swolchok/803/head 2025-09-07T06:13:53.1420456Z * [new branch] gh/swolchok/803/orig -> origin/gh/swolchok/803/orig 2025-09-07T06:13:53.1422271Z * [new branch] gh/swolchok/804/base -> origin/gh/swolchok/804/base 2025-09-07T06:13:53.1423337Z * [new branch] gh/swolchok/804/head -> origin/gh/swolchok/804/head 2025-09-07T06:13:53.1424721Z * [new branch] gh/swolchok/804/orig -> origin/gh/swolchok/804/orig 2025-09-07T06:13:53.1426314Z * [new branch] gh/swolchok/805/base -> origin/gh/swolchok/805/base 2025-09-07T06:13:53.1427452Z * [new branch] gh/swolchok/805/head -> origin/gh/swolchok/805/head 2025-09-07T06:13:53.1428728Z * [new branch] gh/swolchok/805/orig -> origin/gh/swolchok/805/orig 2025-09-07T06:13:53.1430123Z * [new branch] gh/swolchok/806/base -> origin/gh/swolchok/806/base 2025-09-07T06:13:53.1431285Z * [new branch] gh/swolchok/806/head -> origin/gh/swolchok/806/head 2025-09-07T06:13:53.1432424Z * [new branch] gh/swolchok/806/orig -> origin/gh/swolchok/806/orig 2025-09-07T06:13:53.1434059Z * [new branch] gh/swolchok/807/base -> origin/gh/swolchok/807/base 2025-09-07T06:13:53.1435172Z * [new branch] gh/swolchok/807/head -> origin/gh/swolchok/807/head 2025-09-07T06:13:53.1436399Z * [new branch] gh/swolchok/807/orig -> origin/gh/swolchok/807/orig 2025-09-07T06:13:53.1438102Z * [new branch] gh/swolchok/808/base -> origin/gh/swolchok/808/base 2025-09-07T06:13:53.1439223Z * [new branch] gh/swolchok/808/head -> origin/gh/swolchok/808/head 2025-09-07T06:13:53.1440299Z * [new branch] gh/swolchok/808/orig -> origin/gh/swolchok/808/orig 2025-09-07T06:13:53.1442338Z * [new branch] gh/swolchok/809/base -> origin/gh/swolchok/809/base 2025-09-07T06:13:53.1443513Z * [new branch] gh/swolchok/809/head -> origin/gh/swolchok/809/head 2025-09-07T06:13:53.1444719Z * [new branch] gh/swolchok/809/orig -> origin/gh/swolchok/809/orig 2025-09-07T06:13:53.1446357Z * [new branch] gh/swolchok/810/base -> origin/gh/swolchok/810/base 2025-09-07T06:13:53.1447450Z * [new branch] gh/swolchok/810/head -> origin/gh/swolchok/810/head 2025-09-07T06:13:53.1448557Z * [new branch] gh/swolchok/810/orig -> origin/gh/swolchok/810/orig 2025-09-07T06:13:53.1450661Z * [new branch] gh/swolchok/811/base -> origin/gh/swolchok/811/base 2025-09-07T06:13:53.1451997Z * [new branch] gh/swolchok/811/head -> origin/gh/swolchok/811/head 2025-09-07T06:13:53.1453275Z * [new branch] gh/swolchok/811/orig -> origin/gh/swolchok/811/orig 2025-09-07T06:13:53.1454995Z * [new branch] gh/swolchok/812/base -> origin/gh/swolchok/812/base 2025-09-07T06:13:53.1456102Z * [new branch] gh/swolchok/812/head -> origin/gh/swolchok/812/head 2025-09-07T06:13:53.1457254Z * [new branch] gh/swolchok/812/orig -> origin/gh/swolchok/812/orig 2025-09-07T06:13:53.1459030Z * [new branch] gh/swolchok/813/base -> origin/gh/swolchok/813/base 2025-09-07T06:13:53.1460136Z * [new branch] gh/swolchok/813/head -> origin/gh/swolchok/813/head 2025-09-07T06:13:53.1461475Z * [new branch] gh/swolchok/813/orig -> origin/gh/swolchok/813/orig 2025-09-07T06:13:53.1463813Z * [new branch] gh/swolchok/814/base -> origin/gh/swolchok/814/base 2025-09-07T06:13:53.1464881Z * [new branch] gh/swolchok/814/head -> origin/gh/swolchok/814/head 2025-09-07T06:13:53.1465982Z * [new branch] gh/swolchok/814/orig -> origin/gh/swolchok/814/orig 2025-09-07T06:13:53.1467704Z * [new branch] gh/swolchok/815/base -> origin/gh/swolchok/815/base 2025-09-07T06:13:53.1468760Z * [new branch] gh/swolchok/815/head -> origin/gh/swolchok/815/head 2025-09-07T06:13:53.1469924Z * [new branch] gh/swolchok/815/orig -> origin/gh/swolchok/815/orig 2025-09-07T06:13:53.1471586Z * [new branch] gh/swolchok/816/base -> origin/gh/swolchok/816/base 2025-09-07T06:13:53.1472746Z * [new branch] gh/swolchok/816/head -> origin/gh/swolchok/816/head 2025-09-07T06:13:53.1473897Z * [new branch] gh/swolchok/816/orig -> origin/gh/swolchok/816/orig 2025-09-07T06:13:53.1476101Z * [new branch] gh/swolchok/817/base -> origin/gh/swolchok/817/base 2025-09-07T06:13:53.1477244Z * [new branch] gh/swolchok/817/head -> origin/gh/swolchok/817/head 2025-09-07T06:13:53.1478436Z * [new branch] gh/swolchok/817/orig -> origin/gh/swolchok/817/orig 2025-09-07T06:13:53.1480113Z * [new branch] gh/swolchok/818/base -> origin/gh/swolchok/818/base 2025-09-07T06:13:53.1481173Z * [new branch] gh/swolchok/818/head -> origin/gh/swolchok/818/head 2025-09-07T06:13:53.1482275Z * [new branch] gh/swolchok/818/orig -> origin/gh/swolchok/818/orig 2025-09-07T06:13:53.1484075Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-09-07T06:13:53.1485178Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-09-07T06:13:53.1486291Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-09-07T06:13:53.1487889Z * [new branch] gh/swolchok/820/base -> origin/gh/swolchok/820/base 2025-09-07T06:13:53.1488959Z * [new branch] gh/swolchok/820/head -> origin/gh/swolchok/820/head 2025-09-07T06:13:53.1490125Z * [new branch] gh/swolchok/820/orig -> origin/gh/swolchok/820/orig 2025-09-07T06:13:53.1492026Z * [new branch] gh/swolchok/821/base -> origin/gh/swolchok/821/base 2025-09-07T06:13:53.1493255Z * [new branch] gh/swolchok/821/head -> origin/gh/swolchok/821/head 2025-09-07T06:13:53.1494494Z * [new branch] gh/swolchok/821/orig -> origin/gh/swolchok/821/orig 2025-09-07T06:13:53.1496286Z * [new branch] gh/swolchok/822/base -> origin/gh/swolchok/822/base 2025-09-07T06:13:53.1497451Z * [new branch] gh/swolchok/822/head -> origin/gh/swolchok/822/head 2025-09-07T06:13:53.1498554Z * [new branch] gh/swolchok/822/orig -> origin/gh/swolchok/822/orig 2025-09-07T06:13:53.1500294Z * [new branch] gh/swolchok/823/base -> origin/gh/swolchok/823/base 2025-09-07T06:13:53.1501424Z * [new branch] gh/swolchok/823/head -> origin/gh/swolchok/823/head 2025-09-07T06:13:53.1502559Z * [new branch] gh/swolchok/823/orig -> origin/gh/swolchok/823/orig 2025-09-07T06:13:53.1504240Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-09-07T06:13:53.1505354Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-09-07T06:13:53.1506483Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-09-07T06:13:53.1508059Z * [new branch] gh/swolchok/825/base -> origin/gh/swolchok/825/base 2025-09-07T06:13:53.1509292Z * [new branch] gh/swolchok/825/head -> origin/gh/swolchok/825/head 2025-09-07T06:13:53.1510459Z * [new branch] gh/swolchok/825/orig -> origin/gh/swolchok/825/orig 2025-09-07T06:13:53.1512197Z * [new branch] gh/swolchok/826/base -> origin/gh/swolchok/826/base 2025-09-07T06:13:53.1513283Z * [new branch] gh/swolchok/826/head -> origin/gh/swolchok/826/head 2025-09-07T06:13:53.1514306Z * [new branch] gh/swolchok/826/orig -> origin/gh/swolchok/826/orig 2025-09-07T06:13:53.1516019Z * [new branch] gh/swolchok/827/base -> origin/gh/swolchok/827/base 2025-09-07T06:13:53.1517101Z * [new branch] gh/swolchok/827/head -> origin/gh/swolchok/827/head 2025-09-07T06:13:53.1518167Z * [new branch] gh/swolchok/827/orig -> origin/gh/swolchok/827/orig 2025-09-07T06:13:53.1519867Z * [new branch] gh/swolchok/828/base -> origin/gh/swolchok/828/base 2025-09-07T06:13:53.1520960Z * [new branch] gh/swolchok/828/head -> origin/gh/swolchok/828/head 2025-09-07T06:13:53.1522029Z * [new branch] gh/swolchok/828/orig -> origin/gh/swolchok/828/orig 2025-09-07T06:13:53.1523474Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-09-07T06:13:53.1524625Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-09-07T06:13:53.1525861Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-09-07T06:13:53.1527544Z * [new branch] gh/swolchok/830/base -> origin/gh/swolchok/830/base 2025-09-07T06:13:53.1529153Z * [new branch] gh/swolchok/830/head -> origin/gh/swolchok/830/head 2025-09-07T06:13:53.1530266Z * [new branch] gh/swolchok/830/orig -> origin/gh/swolchok/830/orig 2025-09-07T06:13:53.1531932Z * [new branch] gh/swolchok/831/base -> origin/gh/swolchok/831/base 2025-09-07T06:13:53.1533245Z * [new branch] gh/swolchok/831/head -> origin/gh/swolchok/831/head 2025-09-07T06:13:53.1534361Z * [new branch] gh/swolchok/831/orig -> origin/gh/swolchok/831/orig 2025-09-07T06:13:53.1535824Z * [new branch] gh/swolchok/832/base -> origin/gh/swolchok/832/base 2025-09-07T06:13:53.1537049Z * [new branch] gh/swolchok/832/head -> origin/gh/swolchok/832/head 2025-09-07T06:13:53.1538162Z * [new branch] gh/swolchok/832/orig -> origin/gh/swolchok/832/orig 2025-09-07T06:13:53.1540027Z * [new branch] gh/syed-ahmed/3/base -> origin/gh/syed-ahmed/3/base 2025-09-07T06:13:53.1541280Z * [new branch] gh/syed-ahmed/3/head -> origin/gh/syed-ahmed/3/head 2025-09-07T06:13:53.1542480Z * [new branch] gh/syed-ahmed/3/orig -> origin/gh/syed-ahmed/3/orig 2025-09-07T06:13:53.1544267Z * [new branch] gh/syed-ahmed/4/base -> origin/gh/syed-ahmed/4/base 2025-09-07T06:13:53.1545398Z * [new branch] gh/syed-ahmed/4/head -> origin/gh/syed-ahmed/4/head 2025-09-07T06:13:53.1546528Z * [new branch] gh/syed-ahmed/4/orig -> origin/gh/syed-ahmed/4/orig 2025-09-07T06:13:53.1548015Z * [new branch] gh/syed-ahmed/5/base -> origin/gh/syed-ahmed/5/base 2025-09-07T06:13:53.1549498Z * [new branch] gh/syed-ahmed/5/head -> origin/gh/syed-ahmed/5/head 2025-09-07T06:13:53.1550731Z * [new branch] gh/syed-ahmed/5/orig -> origin/gh/syed-ahmed/5/orig 2025-09-07T06:13:53.1552724Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-09-07T06:13:53.1553942Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-09-07T06:13:53.1555154Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-09-07T06:13:53.1557232Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-09-07T06:13:53.1558341Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-09-07T06:13:53.1559476Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-09-07T06:13:53.1561051Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-09-07T06:13:53.1562261Z * [new branch] gh/tianyu-l/3/head -> origin/gh/tianyu-l/3/head 2025-09-07T06:13:53.1563364Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-09-07T06:13:53.1564901Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-09-07T06:13:53.1566017Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-09-07T06:13:53.1567126Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-09-07T06:13:53.1569138Z * [new branch] gh/tugsbayasgalan/1/base -> origin/gh/tugsbayasgalan/1/base 2025-09-07T06:13:53.1570269Z * [new branch] gh/tugsbayasgalan/1/head -> origin/gh/tugsbayasgalan/1/head 2025-09-07T06:13:53.1571706Z * [new branch] gh/tugsbayasgalan/1/orig -> origin/gh/tugsbayasgalan/1/orig 2025-09-07T06:13:53.1573886Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-09-07T06:13:53.1575070Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-09-07T06:13:53.1576247Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-09-07T06:13:53.1577681Z * [new branch] gh/tugsbayasgalan/11/base -> origin/gh/tugsbayasgalan/11/base 2025-09-07T06:13:53.1578907Z * [new branch] gh/tugsbayasgalan/11/head -> origin/gh/tugsbayasgalan/11/head 2025-09-07T06:13:53.1580060Z * [new branch] gh/tugsbayasgalan/11/orig -> origin/gh/tugsbayasgalan/11/orig 2025-09-07T06:13:53.1581784Z * [new branch] gh/tugsbayasgalan/12/base -> origin/gh/tugsbayasgalan/12/base 2025-09-07T06:13:53.1582887Z * [new branch] gh/tugsbayasgalan/12/head -> origin/gh/tugsbayasgalan/12/head 2025-09-07T06:13:53.1584127Z * [new branch] gh/tugsbayasgalan/12/orig -> origin/gh/tugsbayasgalan/12/orig 2025-09-07T06:13:53.1585649Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-09-07T06:13:53.1586748Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-09-07T06:13:53.1587949Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-09-07T06:13:53.1589681Z * [new branch] gh/tugsbayasgalan/14/base -> origin/gh/tugsbayasgalan/14/base 2025-09-07T06:13:53.1590730Z * [new branch] gh/tugsbayasgalan/14/head -> origin/gh/tugsbayasgalan/14/head 2025-09-07T06:13:53.1591853Z * [new branch] gh/tugsbayasgalan/14/orig -> origin/gh/tugsbayasgalan/14/orig 2025-09-07T06:13:53.1593569Z * [new branch] gh/tugsbayasgalan/15/base -> origin/gh/tugsbayasgalan/15/base 2025-09-07T06:13:53.1594672Z * [new branch] gh/tugsbayasgalan/15/head -> origin/gh/tugsbayasgalan/15/head 2025-09-07T06:13:53.1595768Z * [new branch] gh/tugsbayasgalan/15/orig -> origin/gh/tugsbayasgalan/15/orig 2025-09-07T06:13:53.1597320Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-09-07T06:13:53.1598429Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-09-07T06:13:53.1599553Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-09-07T06:13:53.1600914Z * [new branch] gh/tugsbayasgalan/3/base -> origin/gh/tugsbayasgalan/3/base 2025-09-07T06:13:53.1602212Z * [new branch] gh/tugsbayasgalan/3/head -> origin/gh/tugsbayasgalan/3/head 2025-09-07T06:13:53.1603540Z * [new branch] gh/tugsbayasgalan/3/orig -> origin/gh/tugsbayasgalan/3/orig 2025-09-07T06:13:53.1605012Z * [new branch] gh/tugsbayasgalan/4/base -> origin/gh/tugsbayasgalan/4/base 2025-09-07T06:13:53.1606352Z * [new branch] gh/tugsbayasgalan/4/head -> origin/gh/tugsbayasgalan/4/head 2025-09-07T06:13:53.1607585Z * [new branch] gh/tugsbayasgalan/4/orig -> origin/gh/tugsbayasgalan/4/orig 2025-09-07T06:13:53.1609236Z * [new branch] gh/tugsbayasgalan/5/base -> origin/gh/tugsbayasgalan/5/base 2025-09-07T06:13:53.1610455Z * [new branch] gh/tugsbayasgalan/5/head -> origin/gh/tugsbayasgalan/5/head 2025-09-07T06:13:53.1611690Z * [new branch] gh/tugsbayasgalan/5/orig -> origin/gh/tugsbayasgalan/5/orig 2025-09-07T06:13:53.1613567Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-09-07T06:13:53.1614668Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-09-07T06:13:53.1615837Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-09-07T06:13:53.1617453Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-09-07T06:13:53.1618655Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-09-07T06:13:53.1619944Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-09-07T06:13:53.1621554Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-09-07T06:13:53.1622636Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-09-07T06:13:53.1623951Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-09-07T06:13:53.1625428Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-09-07T06:13:53.1626351Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-09-07T06:13:53.1627532Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-09-07T06:13:53.1629435Z * [new branch] gh/v0i0/1/base -> origin/gh/v0i0/1/base 2025-09-07T06:13:53.1630576Z * [new branch] gh/v0i0/1/head -> origin/gh/v0i0/1/head 2025-09-07T06:13:53.1631689Z * [new branch] gh/v0i0/1/orig -> origin/gh/v0i0/1/orig 2025-09-07T06:13:53.1633319Z * [new branch] gh/v0i0/4/base -> origin/gh/v0i0/4/base 2025-09-07T06:13:53.1634518Z * [new branch] gh/v0i0/4/head -> origin/gh/v0i0/4/head 2025-09-07T06:13:53.1635579Z * [new branch] gh/v0i0/4/orig -> origin/gh/v0i0/4/orig 2025-09-07T06:13:53.1637141Z * [new branch] gh/v0i0/6/base -> origin/gh/v0i0/6/base 2025-09-07T06:13:53.1638290Z * [new branch] gh/v0i0/6/head -> origin/gh/v0i0/6/head 2025-09-07T06:13:53.1639442Z * [new branch] gh/v0i0/6/orig -> origin/gh/v0i0/6/orig 2025-09-07T06:13:53.1641477Z * [new branch] gh/v0i0/7/base -> origin/gh/v0i0/7/base 2025-09-07T06:13:53.1642649Z * [new branch] gh/v0i0/7/head -> origin/gh/v0i0/7/head 2025-09-07T06:13:53.1643775Z * [new branch] gh/v0i0/7/orig -> origin/gh/v0i0/7/orig 2025-09-07T06:13:53.1645231Z * [new branch] gh/v0i0/8/base -> origin/gh/v0i0/8/base 2025-09-07T06:13:53.1646243Z * [new branch] gh/v0i0/8/head -> origin/gh/v0i0/8/head 2025-09-07T06:13:53.1647385Z * [new branch] gh/v0i0/8/orig -> origin/gh/v0i0/8/orig 2025-09-07T06:13:53.1649083Z * [new branch] gh/v0i0/9/base -> origin/gh/v0i0/9/base 2025-09-07T06:13:53.1650720Z * [new branch] gh/v0i0/9/head -> origin/gh/v0i0/9/head 2025-09-07T06:13:53.1651878Z * [new branch] gh/v0i0/9/orig -> origin/gh/v0i0/9/orig 2025-09-07T06:13:53.1653751Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-09-07T06:13:53.1655340Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-09-07T06:13:53.1656891Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-09-07T06:13:53.1658508Z * [new branch] gh/vkuzo/4/base -> origin/gh/vkuzo/4/base 2025-09-07T06:13:53.1659762Z * [new branch] gh/vkuzo/4/head -> origin/gh/vkuzo/4/head 2025-09-07T06:13:53.1661033Z * [new branch] gh/vkuzo/4/orig -> origin/gh/vkuzo/4/orig 2025-09-07T06:13:53.1662796Z * [new branch] gh/vkuzo/5/base -> origin/gh/vkuzo/5/base 2025-09-07T06:13:53.1664211Z * [new branch] gh/vkuzo/5/head -> origin/gh/vkuzo/5/head 2025-09-07T06:13:53.1665474Z * [new branch] gh/vkuzo/5/orig -> origin/gh/vkuzo/5/orig 2025-09-07T06:13:53.1667189Z * [new branch] gh/vkuzo/6/base -> origin/gh/vkuzo/6/base 2025-09-07T06:13:53.1668269Z * [new branch] gh/vkuzo/6/head -> origin/gh/vkuzo/6/head 2025-09-07T06:13:53.1669500Z * [new branch] gh/vkuzo/6/orig -> origin/gh/vkuzo/6/orig 2025-09-07T06:13:53.1670856Z * [new branch] gh/vkuzo/7/base -> origin/gh/vkuzo/7/base 2025-09-07T06:13:53.1672073Z * [new branch] gh/vkuzo/7/head -> origin/gh/vkuzo/7/head 2025-09-07T06:13:53.1673194Z * [new branch] gh/vkuzo/7/orig -> origin/gh/vkuzo/7/orig 2025-09-07T06:13:53.1675118Z * [new branch] gh/wconstab/419/base -> origin/gh/wconstab/419/base 2025-09-07T06:13:53.1676173Z * [new branch] gh/wconstab/419/head -> origin/gh/wconstab/419/head 2025-09-07T06:13:53.1677290Z * [new branch] gh/wconstab/419/orig -> origin/gh/wconstab/419/orig 2025-09-07T06:13:53.1679048Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-09-07T06:13:53.1680207Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-09-07T06:13:53.1681474Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-09-07T06:13:53.1683025Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-09-07T06:13:53.1684201Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-09-07T06:13:53.1685322Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-09-07T06:13:53.1686851Z * [new branch] gh/wconstab/438/base -> origin/gh/wconstab/438/base 2025-09-07T06:13:53.1688005Z * [new branch] gh/wconstab/438/head -> origin/gh/wconstab/438/head 2025-09-07T06:13:53.1689106Z * [new branch] gh/wconstab/438/orig -> origin/gh/wconstab/438/orig 2025-09-07T06:13:53.1690632Z * [new branch] gh/wconstab/440/base -> origin/gh/wconstab/440/base 2025-09-07T06:13:53.1692264Z * [new branch] gh/wconstab/440/head -> origin/gh/wconstab/440/head 2025-09-07T06:13:53.1693630Z * [new branch] gh/wconstab/440/orig -> origin/gh/wconstab/440/orig 2025-09-07T06:13:53.1695419Z * [new branch] gh/wconstab/441/base -> origin/gh/wconstab/441/base 2025-09-07T06:13:53.1696646Z * [new branch] gh/wconstab/441/head -> origin/gh/wconstab/441/head 2025-09-07T06:13:53.1697866Z * [new branch] gh/wconstab/441/orig -> origin/gh/wconstab/441/orig 2025-09-07T06:13:53.1699753Z * [new branch] gh/wconstab/442/base -> origin/gh/wconstab/442/base 2025-09-07T06:13:53.1700898Z * [new branch] gh/wconstab/442/head -> origin/gh/wconstab/442/head 2025-09-07T06:13:53.1702127Z * [new branch] gh/wconstab/442/orig -> origin/gh/wconstab/442/orig 2025-09-07T06:13:53.1703831Z * [new branch] gh/wconstab/443/base -> origin/gh/wconstab/443/base 2025-09-07T06:13:53.1704983Z * [new branch] gh/wconstab/443/head -> origin/gh/wconstab/443/head 2025-09-07T06:13:53.1706051Z * [new branch] gh/wconstab/443/orig -> origin/gh/wconstab/443/orig 2025-09-07T06:13:53.1707560Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-09-07T06:13:53.1708709Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-09-07T06:13:53.1709878Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-09-07T06:13:53.1711444Z * [new branch] gh/wconstab/445/base -> origin/gh/wconstab/445/base 2025-09-07T06:13:53.1712617Z * [new branch] gh/wconstab/445/head -> origin/gh/wconstab/445/head 2025-09-07T06:13:53.1713737Z * [new branch] gh/wconstab/445/orig -> origin/gh/wconstab/445/orig 2025-09-07T06:13:53.1715860Z * [new branch] gh/wconstab/446/base -> origin/gh/wconstab/446/base 2025-09-07T06:13:53.1717234Z * [new branch] gh/wconstab/446/head -> origin/gh/wconstab/446/head 2025-09-07T06:13:53.1718752Z * [new branch] gh/wconstab/446/orig -> origin/gh/wconstab/446/orig 2025-09-07T06:13:53.1720298Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-09-07T06:13:53.1721467Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-09-07T06:13:53.1722604Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-09-07T06:13:53.1724540Z * [new branch] gh/weifengpy/27/base -> origin/gh/weifengpy/27/base 2025-09-07T06:13:53.1725642Z * [new branch] gh/weifengpy/27/head -> origin/gh/weifengpy/27/head 2025-09-07T06:13:53.1726777Z * [new branch] gh/weifengpy/27/orig -> origin/gh/weifengpy/27/orig 2025-09-07T06:13:53.1728382Z * [new branch] gh/weifengpy/30/base -> origin/gh/weifengpy/30/base 2025-09-07T06:13:53.1729498Z * [new branch] gh/weifengpy/30/head -> origin/gh/weifengpy/30/head 2025-09-07T06:13:53.1730612Z * [new branch] gh/weifengpy/30/orig -> origin/gh/weifengpy/30/orig 2025-09-07T06:13:53.1732943Z * [new branch] gh/williamwen42/196/base -> origin/gh/williamwen42/196/base 2025-09-07T06:13:53.1734158Z * [new branch] gh/williamwen42/196/head -> origin/gh/williamwen42/196/head 2025-09-07T06:13:53.1735503Z * [new branch] gh/williamwen42/196/orig -> origin/gh/williamwen42/196/orig 2025-09-07T06:13:53.1737128Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-09-07T06:13:53.1738292Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-09-07T06:13:53.1739473Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-09-07T06:13:53.1741135Z * [new branch] gh/williamwen42/258/base -> origin/gh/williamwen42/258/base 2025-09-07T06:13:53.1742418Z * [new branch] gh/williamwen42/258/head -> origin/gh/williamwen42/258/head 2025-09-07T06:13:53.1743571Z * [new branch] gh/williamwen42/258/orig -> origin/gh/williamwen42/258/orig 2025-09-07T06:13:53.1745284Z * [new branch] gh/williamwen42/266/base -> origin/gh/williamwen42/266/base 2025-09-07T06:13:53.1752054Z * [new branch] gh/williamwen42/266/head -> origin/gh/williamwen42/266/head 2025-09-07T06:13:53.1752639Z * [new branch] gh/williamwen42/266/orig -> origin/gh/williamwen42/266/orig 2025-09-07T06:13:53.1752925Z * [new branch] gh/williamwen42/267/base -> origin/gh/williamwen42/267/base 2025-09-07T06:13:53.1753205Z * [new branch] gh/williamwen42/267/head -> origin/gh/williamwen42/267/head 2025-09-07T06:13:53.1753492Z * [new branch] gh/williamwen42/267/orig -> origin/gh/williamwen42/267/orig 2025-09-07T06:13:53.1753824Z * [new branch] gh/williamwen42/270/base -> origin/gh/williamwen42/270/base 2025-09-07T06:13:53.1755214Z * [new branch] gh/williamwen42/270/head -> origin/gh/williamwen42/270/head 2025-09-07T06:13:53.1756374Z * [new branch] gh/williamwen42/270/orig -> origin/gh/williamwen42/270/orig 2025-09-07T06:13:53.1757977Z * [new branch] gh/williamwen42/271/base -> origin/gh/williamwen42/271/base 2025-09-07T06:13:53.1759269Z * [new branch] gh/williamwen42/271/head -> origin/gh/williamwen42/271/head 2025-09-07T06:13:53.1760419Z * [new branch] gh/williamwen42/271/orig -> origin/gh/williamwen42/271/orig 2025-09-07T06:13:53.1762208Z * [new branch] gh/williamwen42/272/base -> origin/gh/williamwen42/272/base 2025-09-07T06:13:53.1763373Z * [new branch] gh/williamwen42/272/head -> origin/gh/williamwen42/272/head 2025-09-07T06:13:53.1765718Z * [new branch] gh/williamwen42/272/orig -> origin/gh/williamwen42/272/orig 2025-09-07T06:13:53.1766977Z * [new branch] gh/williamwen42/274/base -> origin/gh/williamwen42/274/base 2025-09-07T06:13:53.1767256Z * [new branch] gh/williamwen42/274/head -> origin/gh/williamwen42/274/head 2025-09-07T06:13:53.1768265Z * [new branch] gh/williamwen42/274/orig -> origin/gh/williamwen42/274/orig 2025-09-07T06:13:53.1769840Z * [new branch] gh/williamwen42/275/base -> origin/gh/williamwen42/275/base 2025-09-07T06:13:53.1771018Z * [new branch] gh/williamwen42/275/head -> origin/gh/williamwen42/275/head 2025-09-07T06:13:53.1772804Z * [new branch] gh/williamwen42/276/base -> origin/gh/williamwen42/276/base 2025-09-07T06:13:53.1773927Z * [new branch] gh/williamwen42/276/head -> origin/gh/williamwen42/276/head 2025-09-07T06:13:53.1775109Z * [new branch] gh/williamwen42/276/orig -> origin/gh/williamwen42/276/orig 2025-09-07T06:13:53.1776941Z * [new branch] gh/williamwen42/277/base -> origin/gh/williamwen42/277/base 2025-09-07T06:13:53.1778090Z * [new branch] gh/williamwen42/277/head -> origin/gh/williamwen42/277/head 2025-09-07T06:13:53.1779229Z * [new branch] gh/williamwen42/277/orig -> origin/gh/williamwen42/277/orig 2025-09-07T06:13:53.1780861Z * [new branch] gh/williamwen42/278/base -> origin/gh/williamwen42/278/base 2025-09-07T06:13:53.1782077Z * [new branch] gh/williamwen42/278/head -> origin/gh/williamwen42/278/head 2025-09-07T06:13:53.1783393Z * [new branch] gh/williamwen42/278/orig -> origin/gh/williamwen42/278/orig 2025-09-07T06:13:53.1784983Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-09-07T06:13:53.1786139Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-09-07T06:13:53.1787269Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-09-07T06:13:53.1788830Z * [new branch] gh/williamwen42/280/base -> origin/gh/williamwen42/280/base 2025-09-07T06:13:53.1789975Z * [new branch] gh/williamwen42/280/head -> origin/gh/williamwen42/280/head 2025-09-07T06:13:53.1791128Z * [new branch] gh/williamwen42/280/orig -> origin/gh/williamwen42/280/orig 2025-09-07T06:13:53.1792841Z * [new branch] gh/williamwen42/281/base -> origin/gh/williamwen42/281/base 2025-09-07T06:13:53.1793953Z * [new branch] gh/williamwen42/281/head -> origin/gh/williamwen42/281/head 2025-09-07T06:13:53.1794890Z * [new branch] gh/williamwen42/281/orig -> origin/gh/williamwen42/281/orig 2025-09-07T06:13:53.1796764Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-09-07T06:13:53.1797890Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-09-07T06:13:53.1799006Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-09-07T06:13:53.1800799Z * [new branch] gh/williamwen42/283/base -> origin/gh/williamwen42/283/base 2025-09-07T06:13:53.1802013Z * [new branch] gh/williamwen42/283/head -> origin/gh/williamwen42/283/head 2025-09-07T06:13:53.1803195Z * [new branch] gh/williamwen42/283/orig -> origin/gh/williamwen42/283/orig 2025-09-07T06:13:53.1805065Z * [new branch] gh/williamwen42/284/base -> origin/gh/williamwen42/284/base 2025-09-07T06:13:53.1806128Z * [new branch] gh/williamwen42/284/head -> origin/gh/williamwen42/284/head 2025-09-07T06:13:53.1807239Z * [new branch] gh/williamwen42/284/orig -> origin/gh/williamwen42/284/orig 2025-09-07T06:13:53.1808846Z * [new branch] gh/williamwen42/285/base -> origin/gh/williamwen42/285/base 2025-09-07T06:13:53.1810400Z * [new branch] gh/williamwen42/285/head -> origin/gh/williamwen42/285/head 2025-09-07T06:13:53.1811522Z * [new branch] gh/williamwen42/285/orig -> origin/gh/williamwen42/285/orig 2025-09-07T06:13:53.1813331Z * [new branch] gh/williamwen42/286/base -> origin/gh/williamwen42/286/base 2025-09-07T06:13:53.1814483Z * [new branch] gh/williamwen42/286/head -> origin/gh/williamwen42/286/head 2025-09-07T06:13:53.1815569Z * [new branch] gh/williamwen42/286/orig -> origin/gh/williamwen42/286/orig 2025-09-07T06:13:53.1817368Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-09-07T06:13:53.1818545Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-09-07T06:13:53.1819739Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-09-07T06:13:53.1821745Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-09-07T06:13:53.1822906Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-09-07T06:13:53.1824186Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-09-07T06:13:53.1825960Z * [new branch] gh/williamwen42/289/base -> origin/gh/williamwen42/289/base 2025-09-07T06:13:53.1827548Z * [new branch] gh/williamwen42/289/head -> origin/gh/williamwen42/289/head 2025-09-07T06:13:53.1828598Z * [new branch] gh/williamwen42/289/orig -> origin/gh/williamwen42/289/orig 2025-09-07T06:13:53.1830965Z * [new branch] gh/wychi/1/base -> origin/gh/wychi/1/base 2025-09-07T06:13:53.1832115Z * [new branch] gh/wychi/1/head -> origin/gh/wychi/1/head 2025-09-07T06:13:53.1833444Z * [new branch] gh/wychi/1/orig -> origin/gh/wychi/1/orig 2025-09-07T06:13:53.1835288Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-09-07T06:13:53.1836325Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-09-07T06:13:53.1837845Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-09-07T06:13:53.1838775Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-09-07T06:13:53.1840546Z * [new branch] gh/xmfan/18/base -> origin/gh/xmfan/18/base 2025-09-07T06:13:53.1841688Z * [new branch] gh/xmfan/18/head -> origin/gh/xmfan/18/head 2025-09-07T06:13:53.1843208Z * [new branch] gh/xmfan/229/base -> origin/gh/xmfan/229/base 2025-09-07T06:13:53.1844253Z * [new branch] gh/xmfan/229/head -> origin/gh/xmfan/229/head 2025-09-07T06:13:53.1845368Z * [new branch] gh/xmfan/229/orig -> origin/gh/xmfan/229/orig 2025-09-07T06:13:53.1846908Z * [new branch] gh/xmfan/237/base -> origin/gh/xmfan/237/base 2025-09-07T06:13:53.1848013Z * [new branch] gh/xmfan/237/head -> origin/gh/xmfan/237/head 2025-09-07T06:13:53.1849266Z * [new branch] gh/xmfan/237/orig -> origin/gh/xmfan/237/orig 2025-09-07T06:13:53.1853166Z * [new branch] gh/xmfan/244/base -> origin/gh/xmfan/244/base 2025-09-07T06:13:53.1854245Z * [new branch] gh/xmfan/244/head -> origin/gh/xmfan/244/head 2025-09-07T06:13:53.1855391Z * [new branch] gh/xmfan/244/orig -> origin/gh/xmfan/244/orig 2025-09-07T06:13:53.1857009Z * [new branch] gh/xmfan/246/base -> origin/gh/xmfan/246/base 2025-09-07T06:13:53.1858167Z * [new branch] gh/xmfan/246/head -> origin/gh/xmfan/246/head 2025-09-07T06:13:53.1859308Z * [new branch] gh/xmfan/246/orig -> origin/gh/xmfan/246/orig 2025-09-07T06:13:53.1860954Z * [new branch] gh/xmfan/253/base -> origin/gh/xmfan/253/base 2025-09-07T06:13:53.1862014Z * [new branch] gh/xmfan/253/head -> origin/gh/xmfan/253/head 2025-09-07T06:13:53.1863150Z * [new branch] gh/xmfan/253/orig -> origin/gh/xmfan/253/orig 2025-09-07T06:13:53.1864846Z * [new branch] gh/xmfan/254/base -> origin/gh/xmfan/254/base 2025-09-07T06:13:53.1865874Z * [new branch] gh/xmfan/254/head -> origin/gh/xmfan/254/head 2025-09-07T06:13:53.1867022Z * [new branch] gh/xmfan/254/orig -> origin/gh/xmfan/254/orig 2025-09-07T06:13:53.1868577Z * [new branch] gh/xmfan/260/base -> origin/gh/xmfan/260/base 2025-09-07T06:13:53.1869601Z * [new branch] gh/xmfan/260/head -> origin/gh/xmfan/260/head 2025-09-07T06:13:53.1870763Z * [new branch] gh/xmfan/260/orig -> origin/gh/xmfan/260/orig 2025-09-07T06:13:53.1872347Z * [new branch] gh/xmfan/262/base -> origin/gh/xmfan/262/base 2025-09-07T06:13:53.1873460Z * [new branch] gh/xmfan/262/head -> origin/gh/xmfan/262/head 2025-09-07T06:13:53.1874584Z * [new branch] gh/xmfan/262/orig -> origin/gh/xmfan/262/orig 2025-09-07T06:13:53.1876282Z * [new branch] gh/xmfan/263/base -> origin/gh/xmfan/263/base 2025-09-07T06:13:53.1877308Z * [new branch] gh/xmfan/263/head -> origin/gh/xmfan/263/head 2025-09-07T06:13:53.1878454Z * [new branch] gh/xmfan/263/orig -> origin/gh/xmfan/263/orig 2025-09-07T06:13:53.1880022Z * [new branch] gh/xmfan/264/base -> origin/gh/xmfan/264/base 2025-09-07T06:13:53.1881076Z * [new branch] gh/xmfan/264/head -> origin/gh/xmfan/264/head 2025-09-07T06:13:53.1882166Z * [new branch] gh/xmfan/264/orig -> origin/gh/xmfan/264/orig 2025-09-07T06:13:53.1883738Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-09-07T06:13:53.1884804Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-09-07T06:13:53.1885877Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-09-07T06:13:53.1887478Z * [new branch] gh/xmfan/276/base -> origin/gh/xmfan/276/base 2025-09-07T06:13:53.1889143Z * [new branch] gh/xmfan/276/head -> origin/gh/xmfan/276/head 2025-09-07T06:13:53.1890369Z * [new branch] gh/xmfan/276/orig -> origin/gh/xmfan/276/orig 2025-09-07T06:13:53.1892920Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-09-07T06:13:53.1893959Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-09-07T06:13:53.1895112Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-09-07T06:13:53.1896768Z * [new branch] gh/xmfan/278/base -> origin/gh/xmfan/278/base 2025-09-07T06:13:53.1897795Z * [new branch] gh/xmfan/278/head -> origin/gh/xmfan/278/head 2025-09-07T06:13:53.1898935Z * [new branch] gh/xmfan/278/orig -> origin/gh/xmfan/278/orig 2025-09-07T06:13:53.1901461Z * [new branch] gh/xmfan/279/base -> origin/gh/xmfan/279/base 2025-09-07T06:13:53.1902587Z * [new branch] gh/xmfan/279/head -> origin/gh/xmfan/279/head 2025-09-07T06:13:53.1904333Z * [new branch] gh/xmfan/279/orig -> origin/gh/xmfan/279/orig 2025-09-07T06:13:53.1905898Z * [new branch] gh/xmfan/280/base -> origin/gh/xmfan/280/base 2025-09-07T06:13:53.1907035Z * [new branch] gh/xmfan/280/head -> origin/gh/xmfan/280/head 2025-09-07T06:13:53.1908209Z * [new branch] gh/xmfan/280/orig -> origin/gh/xmfan/280/orig 2025-09-07T06:13:53.1909801Z * [new branch] gh/xmfan/281/base -> origin/gh/xmfan/281/base 2025-09-07T06:13:53.1911353Z * [new branch] gh/xmfan/281/head -> origin/gh/xmfan/281/head 2025-09-07T06:13:53.1912480Z * [new branch] gh/xmfan/281/orig -> origin/gh/xmfan/281/orig 2025-09-07T06:13:53.1914117Z * [new branch] gh/xmfan/282/base -> origin/gh/xmfan/282/base 2025-09-07T06:13:53.1915202Z * [new branch] gh/xmfan/282/head -> origin/gh/xmfan/282/head 2025-09-07T06:13:53.1916808Z * [new branch] gh/xmfan/283/base -> origin/gh/xmfan/283/base 2025-09-07T06:13:53.1917840Z * [new branch] gh/xmfan/283/head -> origin/gh/xmfan/283/head 2025-09-07T06:13:53.1918973Z * [new branch] gh/xmfan/283/orig -> origin/gh/xmfan/283/orig 2025-09-07T06:13:53.1920889Z * [new branch] gh/xuanzhang816/14/base -> origin/gh/xuanzhang816/14/base 2025-09-07T06:13:53.1926039Z * [new branch] gh/xuanzhang816/14/head -> origin/gh/xuanzhang816/14/head 2025-09-07T06:13:53.1927145Z * [new branch] gh/xuanzhang816/14/orig -> origin/gh/xuanzhang816/14/orig 2025-09-07T06:13:53.1928780Z * [new branch] gh/xuanzhang816/19/base -> origin/gh/xuanzhang816/19/base 2025-09-07T06:13:53.1929843Z * [new branch] gh/xuanzhang816/19/head -> origin/gh/xuanzhang816/19/head 2025-09-07T06:13:53.1931004Z * [new branch] gh/xuanzhang816/19/orig -> origin/gh/xuanzhang816/19/orig 2025-09-07T06:13:53.1932938Z * [new branch] gh/xuanzhang816/22/base -> origin/gh/xuanzhang816/22/base 2025-09-07T06:13:53.1933996Z * [new branch] gh/xuanzhang816/22/head -> origin/gh/xuanzhang816/22/head 2025-09-07T06:13:53.1935117Z * [new branch] gh/xuanzhang816/22/orig -> origin/gh/xuanzhang816/22/orig 2025-09-07T06:13:53.1936777Z * [new branch] gh/xuanzhang816/23/base -> origin/gh/xuanzhang816/23/base 2025-09-07T06:13:53.1937824Z * [new branch] gh/xuanzhang816/23/head -> origin/gh/xuanzhang816/23/head 2025-09-07T06:13:53.1938966Z * [new branch] gh/xuanzhang816/23/orig -> origin/gh/xuanzhang816/23/orig 2025-09-07T06:13:53.1941069Z * [new branch] gh/xuanzhang816/24/base -> origin/gh/xuanzhang816/24/base 2025-09-07T06:13:53.1942206Z * [new branch] gh/xuanzhang816/24/head -> origin/gh/xuanzhang816/24/head 2025-09-07T06:13:53.1943424Z * [new branch] gh/xuanzhang816/24/orig -> origin/gh/xuanzhang816/24/orig 2025-09-07T06:13:53.1945044Z * [new branch] gh/xuanzhang816/25/base -> origin/gh/xuanzhang816/25/base 2025-09-07T06:13:53.1946044Z * [new branch] gh/xuanzhang816/25/head -> origin/gh/xuanzhang816/25/head 2025-09-07T06:13:53.1947166Z * [new branch] gh/xuanzhang816/25/orig -> origin/gh/xuanzhang816/25/orig 2025-09-07T06:13:53.1948907Z * [new branch] gh/xuanzhang816/26/base -> origin/gh/xuanzhang816/26/base 2025-09-07T06:13:53.1950233Z * [new branch] gh/xuanzhang816/26/head -> origin/gh/xuanzhang816/26/head 2025-09-07T06:13:53.1951371Z * [new branch] gh/xuanzhang816/26/orig -> origin/gh/xuanzhang816/26/orig 2025-09-07T06:13:53.1953440Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-09-07T06:13:53.1954555Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-09-07T06:13:53.1955774Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-09-07T06:13:53.1957515Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-09-07T06:13:53.1958574Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-09-07T06:13:53.1959751Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-09-07T06:13:53.1961501Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-09-07T06:13:53.1962547Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-09-07T06:13:53.1963665Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-09-07T06:13:53.1965260Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-09-07T06:13:53.1966437Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-09-07T06:13:53.1967549Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-09-07T06:13:53.1969046Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-09-07T06:13:53.1970152Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-09-07T06:13:53.1971225Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-09-07T06:13:53.1973174Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-09-07T06:13:53.1974239Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-09-07T06:13:53.1975397Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-09-07T06:13:53.1976983Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-09-07T06:13:53.1978070Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-09-07T06:13:53.1979234Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-09-07T06:13:53.1980971Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-09-07T06:13:53.1982030Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-09-07T06:13:53.1983213Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-09-07T06:13:53.1984976Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-09-07T06:13:53.1986049Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-09-07T06:13:53.1987540Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-09-07T06:13:53.1988659Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-09-07T06:13:53.1989896Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-09-07T06:13:53.1991379Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-09-07T06:13:53.1992385Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-09-07T06:13:53.1993517Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-09-07T06:13:53.1995104Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-09-07T06:13:53.1996208Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-09-07T06:13:53.1997302Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-09-07T06:13:53.1998881Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-09-07T06:13:53.1999941Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-09-07T06:13:53.2001075Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-09-07T06:13:53.2002654Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-09-07T06:13:53.2004099Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-09-07T06:13:53.2004939Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-09-07T06:13:53.2006474Z * [new branch] gh/yanbing-j/36/base -> origin/gh/yanbing-j/36/base 2025-09-07T06:13:53.2007579Z * [new branch] gh/yanbing-j/36/head -> origin/gh/yanbing-j/36/head 2025-09-07T06:13:53.2008708Z * [new branch] gh/yanbing-j/36/orig -> origin/gh/yanbing-j/36/orig 2025-09-07T06:13:53.2010298Z * [new branch] gh/yanbing-j/37/base -> origin/gh/yanbing-j/37/base 2025-09-07T06:13:53.2011445Z * [new branch] gh/yanbing-j/37/head -> origin/gh/yanbing-j/37/head 2025-09-07T06:13:53.2012879Z * [new branch] gh/yanbing-j/37/orig -> origin/gh/yanbing-j/37/orig 2025-09-07T06:13:53.2014789Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-09-07T06:13:53.2015890Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-09-07T06:13:53.2017025Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-09-07T06:13:53.2018753Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-09-07T06:13:53.2019908Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-09-07T06:13:53.2021064Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-09-07T06:13:53.2022663Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-09-07T06:13:53.2023857Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-09-07T06:13:53.2024978Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-09-07T06:13:53.2026545Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-09-07T06:13:53.2027590Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-09-07T06:13:53.2028699Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-09-07T06:13:53.2030726Z * [new branch] gh/yangw-dev/16/base -> origin/gh/yangw-dev/16/base 2025-09-07T06:13:53.2031970Z * [new branch] gh/yangw-dev/16/head -> origin/gh/yangw-dev/16/head 2025-09-07T06:13:53.2033072Z * [new branch] gh/yangw-dev/16/orig -> origin/gh/yangw-dev/16/orig 2025-09-07T06:13:53.2034737Z * [new branch] gh/yangw-dev/17/base -> origin/gh/yangw-dev/17/base 2025-09-07T06:13:53.2035963Z * [new branch] gh/yangw-dev/17/head -> origin/gh/yangw-dev/17/head 2025-09-07T06:13:53.2036974Z * [new branch] gh/yangw-dev/17/orig -> origin/gh/yangw-dev/17/orig 2025-09-07T06:13:53.2038542Z * [new branch] gh/yangw-dev/18/base -> origin/gh/yangw-dev/18/base 2025-09-07T06:13:53.2039601Z * [new branch] gh/yangw-dev/18/head -> origin/gh/yangw-dev/18/head 2025-09-07T06:13:53.2040718Z * [new branch] gh/yangw-dev/18/orig -> origin/gh/yangw-dev/18/orig 2025-09-07T06:13:53.2042314Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-09-07T06:13:53.2043318Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-09-07T06:13:53.2044411Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-09-07T06:13:53.2045975Z * [new branch] gh/yangw-dev/20/base -> origin/gh/yangw-dev/20/base 2025-09-07T06:13:53.2047041Z * [new branch] gh/yangw-dev/20/head -> origin/gh/yangw-dev/20/head 2025-09-07T06:13:53.2048107Z * [new branch] gh/yangw-dev/20/orig -> origin/gh/yangw-dev/20/orig 2025-09-07T06:13:53.2052707Z * [new branch] gh/yangw-dev/21/base -> origin/gh/yangw-dev/21/base 2025-09-07T06:13:53.2053814Z * [new branch] gh/yangw-dev/21/head -> origin/gh/yangw-dev/21/head 2025-09-07T06:13:53.2055078Z * [new branch] gh/yangw-dev/21/orig -> origin/gh/yangw-dev/21/orig 2025-09-07T06:13:53.2056673Z * [new branch] gh/yangw-dev/22/base -> origin/gh/yangw-dev/22/base 2025-09-07T06:13:53.2057760Z * [new branch] gh/yangw-dev/22/head -> origin/gh/yangw-dev/22/head 2025-09-07T06:13:53.2058929Z * [new branch] gh/yangw-dev/22/orig -> origin/gh/yangw-dev/22/orig 2025-09-07T06:13:53.2060487Z * [new branch] gh/yangw-dev/23/base -> origin/gh/yangw-dev/23/base 2025-09-07T06:13:53.2061574Z * [new branch] gh/yangw-dev/23/head -> origin/gh/yangw-dev/23/head 2025-09-07T06:13:53.2062717Z * [new branch] gh/yangw-dev/23/orig -> origin/gh/yangw-dev/23/orig 2025-09-07T06:13:53.2064397Z * [new branch] gh/yangw-dev/24/base -> origin/gh/yangw-dev/24/base 2025-09-07T06:13:53.2065450Z * [new branch] gh/yangw-dev/24/head -> origin/gh/yangw-dev/24/head 2025-09-07T06:13:53.2066522Z * [new branch] gh/yangw-dev/24/orig -> origin/gh/yangw-dev/24/orig 2025-09-07T06:13:53.2068163Z * [new branch] gh/yangw-dev/25/base -> origin/gh/yangw-dev/25/base 2025-09-07T06:13:53.2069251Z * [new branch] gh/yangw-dev/25/head -> origin/gh/yangw-dev/25/head 2025-09-07T06:13:53.2070397Z * [new branch] gh/yangw-dev/25/orig -> origin/gh/yangw-dev/25/orig 2025-09-07T06:13:53.2071936Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-09-07T06:13:53.2072997Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-09-07T06:13:53.2074093Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-09-07T06:13:53.2075684Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-09-07T06:13:53.2076754Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-09-07T06:13:53.2077837Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-09-07T06:13:53.2079831Z * [new branch] gh/ydwu4/233/base -> origin/gh/ydwu4/233/base 2025-09-07T06:13:53.2080959Z * [new branch] gh/ydwu4/233/head -> origin/gh/ydwu4/233/head 2025-09-07T06:13:53.2082074Z * [new branch] gh/ydwu4/233/orig -> origin/gh/ydwu4/233/orig 2025-09-07T06:13:53.2084065Z * [new branch] gh/ydwu4/246/base -> origin/gh/ydwu4/246/base 2025-09-07T06:13:53.2084992Z * [new branch] gh/ydwu4/246/head -> origin/gh/ydwu4/246/head 2025-09-07T06:13:53.2086109Z * [new branch] gh/ydwu4/246/orig -> origin/gh/ydwu4/246/orig 2025-09-07T06:13:53.2087839Z * [new branch] gh/ydwu4/253/base -> origin/gh/ydwu4/253/base 2025-09-07T06:13:53.2088973Z * [new branch] gh/ydwu4/253/head -> origin/gh/ydwu4/253/head 2025-09-07T06:13:53.2090661Z * [new branch] gh/ydwu4/253/orig -> origin/gh/ydwu4/253/orig 2025-09-07T06:13:53.2092519Z * [new branch] gh/ydwu4/255/base -> origin/gh/ydwu4/255/base 2025-09-07T06:13:53.2093652Z * [new branch] gh/ydwu4/255/head -> origin/gh/ydwu4/255/head 2025-09-07T06:13:53.2094870Z * [new branch] gh/ydwu4/255/orig -> origin/gh/ydwu4/255/orig 2025-09-07T06:13:53.2096563Z * [new branch] gh/ydwu4/259/base -> origin/gh/ydwu4/259/base 2025-09-07T06:13:53.2097689Z * [new branch] gh/ydwu4/259/head -> origin/gh/ydwu4/259/head 2025-09-07T06:13:53.2098847Z * [new branch] gh/ydwu4/259/orig -> origin/gh/ydwu4/259/orig 2025-09-07T06:13:53.2100621Z * [new branch] gh/ydwu4/262/base -> origin/gh/ydwu4/262/base 2025-09-07T06:13:53.2102035Z * [new branch] gh/ydwu4/262/head -> origin/gh/ydwu4/262/head 2025-09-07T06:13:53.2103124Z * [new branch] gh/ydwu4/262/orig -> origin/gh/ydwu4/262/orig 2025-09-07T06:13:53.2104835Z * [new branch] gh/ydwu4/263/base -> origin/gh/ydwu4/263/base 2025-09-07T06:13:53.2105898Z * [new branch] gh/ydwu4/263/head -> origin/gh/ydwu4/263/head 2025-09-07T06:13:53.2107041Z * [new branch] gh/ydwu4/263/orig -> origin/gh/ydwu4/263/orig 2025-09-07T06:13:53.2108796Z * [new branch] gh/ydwu4/269/base -> origin/gh/ydwu4/269/base 2025-09-07T06:13:53.2110296Z * [new branch] gh/ydwu4/269/head -> origin/gh/ydwu4/269/head 2025-09-07T06:13:53.2111320Z * [new branch] gh/ydwu4/269/orig -> origin/gh/ydwu4/269/orig 2025-09-07T06:13:53.2112986Z * [new branch] gh/ydwu4/270/base -> origin/gh/ydwu4/270/base 2025-09-07T06:13:53.2114050Z * [new branch] gh/ydwu4/270/head -> origin/gh/ydwu4/270/head 2025-09-07T06:13:53.2115220Z * [new branch] gh/ydwu4/270/orig -> origin/gh/ydwu4/270/orig 2025-09-07T06:13:53.2116990Z * [new branch] gh/ydwu4/272/base -> origin/gh/ydwu4/272/base 2025-09-07T06:13:53.2118287Z * [new branch] gh/ydwu4/272/head -> origin/gh/ydwu4/272/head 2025-09-07T06:13:53.2119376Z * [new branch] gh/ydwu4/272/orig -> origin/gh/ydwu4/272/orig 2025-09-07T06:13:53.2120801Z * [new branch] gh/ydwu4/275/base -> origin/gh/ydwu4/275/base 2025-09-07T06:13:53.2121857Z * [new branch] gh/ydwu4/275/head -> origin/gh/ydwu4/275/head 2025-09-07T06:13:53.2122981Z * [new branch] gh/ydwu4/275/orig -> origin/gh/ydwu4/275/orig 2025-09-07T06:13:53.2124395Z * [new branch] gh/ydwu4/276/base -> origin/gh/ydwu4/276/base 2025-09-07T06:13:53.2125502Z * [new branch] gh/ydwu4/276/head -> origin/gh/ydwu4/276/head 2025-09-07T06:13:53.2126621Z * [new branch] gh/ydwu4/276/orig -> origin/gh/ydwu4/276/orig 2025-09-07T06:13:53.2128302Z * [new branch] gh/ydwu4/279/base -> origin/gh/ydwu4/279/base 2025-09-07T06:13:53.2129522Z * [new branch] gh/ydwu4/279/head -> origin/gh/ydwu4/279/head 2025-09-07T06:13:53.2130658Z * [new branch] gh/ydwu4/279/orig -> origin/gh/ydwu4/279/orig 2025-09-07T06:13:53.2133077Z * [new branch] gh/ydwu4/283/base -> origin/gh/ydwu4/283/base 2025-09-07T06:13:53.2134081Z * [new branch] gh/ydwu4/283/head -> origin/gh/ydwu4/283/head 2025-09-07T06:13:53.2135275Z * [new branch] gh/ydwu4/283/orig -> origin/gh/ydwu4/283/orig 2025-09-07T06:13:53.2136878Z * [new branch] gh/ydwu4/289/base -> origin/gh/ydwu4/289/base 2025-09-07T06:13:53.2137983Z * [new branch] gh/ydwu4/289/head -> origin/gh/ydwu4/289/head 2025-09-07T06:13:53.2139104Z * [new branch] gh/ydwu4/289/orig -> origin/gh/ydwu4/289/orig 2025-09-07T06:13:53.2140870Z * [new branch] gh/ydwu4/290/base -> origin/gh/ydwu4/290/base 2025-09-07T06:13:53.2142062Z * [new branch] gh/ydwu4/290/head -> origin/gh/ydwu4/290/head 2025-09-07T06:13:53.2143218Z * [new branch] gh/ydwu4/290/orig -> origin/gh/ydwu4/290/orig 2025-09-07T06:13:53.2145388Z * [new branch] gh/ydwu4/291/base -> origin/gh/ydwu4/291/base 2025-09-07T06:13:53.2146506Z * [new branch] gh/ydwu4/291/head -> origin/gh/ydwu4/291/head 2025-09-07T06:13:53.2147681Z * [new branch] gh/ydwu4/291/orig -> origin/gh/ydwu4/291/orig 2025-09-07T06:13:53.2149985Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-09-07T06:13:53.2150992Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-09-07T06:13:53.2152112Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-09-07T06:13:53.2153692Z * [new branch] gh/ydwu4/293/base -> origin/gh/ydwu4/293/base 2025-09-07T06:13:53.2154791Z * [new branch] gh/ydwu4/293/head -> origin/gh/ydwu4/293/head 2025-09-07T06:13:53.2155974Z * [new branch] gh/ydwu4/293/orig -> origin/gh/ydwu4/293/orig 2025-09-07T06:13:53.2157772Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-09-07T06:13:53.2158846Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-09-07T06:13:53.2159973Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-09-07T06:13:53.2161828Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-09-07T06:13:53.2162928Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-09-07T06:13:53.2164064Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-09-07T06:13:53.2165722Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-09-07T06:13:53.2166740Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-09-07T06:13:53.2167874Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-09-07T06:13:53.2170463Z * [new branch] gh/ydwu4/300/base -> origin/gh/ydwu4/300/base 2025-09-07T06:13:53.2172685Z * [new branch] gh/ydwu4/300/head -> origin/gh/ydwu4/300/head 2025-09-07T06:13:53.2173925Z * [new branch] gh/ydwu4/300/orig -> origin/gh/ydwu4/300/orig 2025-09-07T06:13:53.2176023Z * [new branch] gh/ydwu4/301/base -> origin/gh/ydwu4/301/base 2025-09-07T06:13:53.2177074Z * [new branch] gh/ydwu4/301/head -> origin/gh/ydwu4/301/head 2025-09-07T06:13:53.2178252Z * [new branch] gh/ydwu4/301/orig -> origin/gh/ydwu4/301/orig 2025-09-07T06:13:53.2179874Z * [new branch] gh/ydwu4/302/base -> origin/gh/ydwu4/302/base 2025-09-07T06:13:53.2180949Z * [new branch] gh/ydwu4/302/head -> origin/gh/ydwu4/302/head 2025-09-07T06:13:53.2182143Z * [new branch] gh/ydwu4/302/orig -> origin/gh/ydwu4/302/orig 2025-09-07T06:13:53.2184096Z * [new branch] gh/ydwu4/303/base -> origin/gh/ydwu4/303/base 2025-09-07T06:13:53.2184945Z * [new branch] gh/ydwu4/303/head -> origin/gh/ydwu4/303/head 2025-09-07T06:13:53.2186119Z * [new branch] gh/ydwu4/303/orig -> origin/gh/ydwu4/303/orig 2025-09-07T06:13:53.2187653Z * [new branch] gh/ydwu4/304/base -> origin/gh/ydwu4/304/base 2025-09-07T06:13:53.2188786Z * [new branch] gh/ydwu4/304/head -> origin/gh/ydwu4/304/head 2025-09-07T06:13:53.2189912Z * [new branch] gh/ydwu4/304/orig -> origin/gh/ydwu4/304/orig 2025-09-07T06:13:53.2191630Z * [new branch] gh/ydwu4/305/base -> origin/gh/ydwu4/305/base 2025-09-07T06:13:53.2192690Z * [new branch] gh/ydwu4/305/head -> origin/gh/ydwu4/305/head 2025-09-07T06:13:53.2193880Z * [new branch] gh/ydwu4/305/orig -> origin/gh/ydwu4/305/orig 2025-09-07T06:13:53.2195537Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-09-07T06:13:53.2196655Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-09-07T06:13:53.2197787Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-09-07T06:13:53.2199450Z * [new branch] gh/ydwu4/307/base -> origin/gh/ydwu4/307/base 2025-09-07T06:13:53.2200461Z * [new branch] gh/ydwu4/307/head -> origin/gh/ydwu4/307/head 2025-09-07T06:13:53.2201814Z * [new branch] gh/ydwu4/307/orig -> origin/gh/ydwu4/307/orig 2025-09-07T06:13:53.2203384Z * [new branch] gh/ydwu4/308/base -> origin/gh/ydwu4/308/base 2025-09-07T06:13:53.2204526Z * [new branch] gh/ydwu4/308/head -> origin/gh/ydwu4/308/head 2025-09-07T06:13:53.2205626Z * [new branch] gh/ydwu4/308/orig -> origin/gh/ydwu4/308/orig 2025-09-07T06:13:53.2207675Z * [new branch] gh/ydwu4/309/base -> origin/gh/ydwu4/309/base 2025-09-07T06:13:53.2208725Z * [new branch] gh/ydwu4/309/head -> origin/gh/ydwu4/309/head 2025-09-07T06:13:53.2209885Z * [new branch] gh/ydwu4/309/orig -> origin/gh/ydwu4/309/orig 2025-09-07T06:13:53.2211898Z * [new branch] gh/ydwu4/310/base -> origin/gh/ydwu4/310/base 2025-09-07T06:13:53.2213367Z * [new branch] gh/ydwu4/310/head -> origin/gh/ydwu4/310/head 2025-09-07T06:13:53.2214493Z * [new branch] gh/ydwu4/310/orig -> origin/gh/ydwu4/310/orig 2025-09-07T06:13:53.2216250Z * [new branch] gh/ydwu4/311/base -> origin/gh/ydwu4/311/base 2025-09-07T06:13:53.2217362Z * [new branch] gh/ydwu4/311/head -> origin/gh/ydwu4/311/head 2025-09-07T06:13:53.2218563Z * [new branch] gh/ydwu4/311/orig -> origin/gh/ydwu4/311/orig 2025-09-07T06:13:53.2220178Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-09-07T06:13:53.2221295Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-09-07T06:13:53.2222444Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-09-07T06:13:53.2224415Z * [new branch] gh/ydwu4/313/base -> origin/gh/ydwu4/313/base 2025-09-07T06:13:53.2225651Z * [new branch] gh/ydwu4/313/head -> origin/gh/ydwu4/313/head 2025-09-07T06:13:53.2226829Z * [new branch] gh/ydwu4/313/orig -> origin/gh/ydwu4/313/orig 2025-09-07T06:13:53.2228535Z * [new branch] gh/ydwu4/314/base -> origin/gh/ydwu4/314/base 2025-09-07T06:13:53.2229886Z * [new branch] gh/ydwu4/314/head -> origin/gh/ydwu4/314/head 2025-09-07T06:13:53.2230945Z * [new branch] gh/ydwu4/314/orig -> origin/gh/ydwu4/314/orig 2025-09-07T06:13:53.2232774Z * [new branch] gh/ydwu4/315/base -> origin/gh/ydwu4/315/base 2025-09-07T06:13:53.2233758Z * [new branch] gh/ydwu4/315/head -> origin/gh/ydwu4/315/head 2025-09-07T06:13:53.2234905Z * [new branch] gh/ydwu4/315/orig -> origin/gh/ydwu4/315/orig 2025-09-07T06:13:53.2236616Z * [new branch] gh/ydwu4/316/base -> origin/gh/ydwu4/316/base 2025-09-07T06:13:53.2237738Z * [new branch] gh/ydwu4/316/head -> origin/gh/ydwu4/316/head 2025-09-07T06:13:53.2238897Z * [new branch] gh/ydwu4/316/orig -> origin/gh/ydwu4/316/orig 2025-09-07T06:13:53.2240615Z * [new branch] gh/ydwu4/317/base -> origin/gh/ydwu4/317/base 2025-09-07T06:13:53.2241705Z * [new branch] gh/ydwu4/317/head -> origin/gh/ydwu4/317/head 2025-09-07T06:13:53.2243362Z * [new branch] gh/ydwu4/317/orig -> origin/gh/ydwu4/317/orig 2025-09-07T06:13:53.2245008Z * [new branch] gh/ydwu4/318/base -> origin/gh/ydwu4/318/base 2025-09-07T06:13:53.2246164Z * [new branch] gh/ydwu4/318/head -> origin/gh/ydwu4/318/head 2025-09-07T06:13:53.2247303Z * [new branch] gh/ydwu4/318/orig -> origin/gh/ydwu4/318/orig 2025-09-07T06:13:53.2249021Z * [new branch] gh/ydwu4/319/base -> origin/gh/ydwu4/319/base 2025-09-07T06:13:53.2250428Z * [new branch] gh/ydwu4/319/head -> origin/gh/ydwu4/319/head 2025-09-07T06:13:53.2251664Z * [new branch] gh/ydwu4/319/orig -> origin/gh/ydwu4/319/orig 2025-09-07T06:13:53.2253471Z * [new branch] gh/ydwu4/320/base -> origin/gh/ydwu4/320/base 2025-09-07T06:13:53.2254520Z * [new branch] gh/ydwu4/320/head -> origin/gh/ydwu4/320/head 2025-09-07T06:13:53.2255676Z * [new branch] gh/ydwu4/320/orig -> origin/gh/ydwu4/320/orig 2025-09-07T06:13:53.2257196Z * [new branch] gh/ydwu4/321/base -> origin/gh/ydwu4/321/base 2025-09-07T06:13:53.2258291Z * [new branch] gh/ydwu4/321/head -> origin/gh/ydwu4/321/head 2025-09-07T06:13:53.2259470Z * [new branch] gh/ydwu4/321/orig -> origin/gh/ydwu4/321/orig 2025-09-07T06:13:53.2261131Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-09-07T06:13:53.2262205Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-09-07T06:13:53.2263519Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-09-07T06:13:53.2265216Z * [new branch] gh/ydwu4/323/base -> origin/gh/ydwu4/323/base 2025-09-07T06:13:53.2266283Z * [new branch] gh/ydwu4/323/head -> origin/gh/ydwu4/323/head 2025-09-07T06:13:53.2267407Z * [new branch] gh/ydwu4/323/orig -> origin/gh/ydwu4/323/orig 2025-09-07T06:13:53.2269026Z * [new branch] gh/ydwu4/324/base -> origin/gh/ydwu4/324/base 2025-09-07T06:13:53.2270059Z * [new branch] gh/ydwu4/324/head -> origin/gh/ydwu4/324/head 2025-09-07T06:13:53.2271132Z * [new branch] gh/ydwu4/324/orig -> origin/gh/ydwu4/324/orig 2025-09-07T06:13:53.2273151Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-09-07T06:13:53.2274214Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-09-07T06:13:53.2276499Z * [new branch] gh/yf225/171/base -> origin/gh/yf225/171/base 2025-09-07T06:13:53.2277690Z * [new branch] gh/yf225/171/head -> origin/gh/yf225/171/head 2025-09-07T06:13:53.2278853Z * [new branch] gh/yf225/171/orig -> origin/gh/yf225/171/orig 2025-09-07T06:13:53.2280623Z * [new branch] gh/yf225/172/base -> origin/gh/yf225/172/base 2025-09-07T06:13:53.2281721Z * [new branch] gh/yf225/172/head -> origin/gh/yf225/172/head 2025-09-07T06:13:53.2282676Z * [new branch] gh/yf225/172/orig -> origin/gh/yf225/172/orig 2025-09-07T06:13:53.2284295Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-09-07T06:13:53.2285345Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-09-07T06:13:53.2287826Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-09-07T06:13:53.2289253Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-09-07T06:13:53.2290378Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-09-07T06:13:53.2292308Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-09-07T06:13:53.2293446Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-09-07T06:13:53.2294592Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-09-07T06:13:53.2296750Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-09-07T06:13:53.2297869Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-09-07T06:13:53.2299362Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-09-07T06:13:53.2300392Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-09-07T06:13:53.2302341Z * [new branch] gh/ysiraichi/79/base -> origin/gh/ysiraichi/79/base 2025-09-07T06:13:53.2303475Z * [new branch] gh/ysiraichi/79/head -> origin/gh/ysiraichi/79/head 2025-09-07T06:13:53.2304954Z * [new branch] gh/ysiraichi/79/orig -> origin/gh/ysiraichi/79/orig 2025-09-07T06:13:53.2306504Z * [new branch] gh/ysiraichi/88/base -> origin/gh/ysiraichi/88/base 2025-09-07T06:13:53.2307566Z * [new branch] gh/ysiraichi/88/head -> origin/gh/ysiraichi/88/head 2025-09-07T06:13:53.2308712Z * [new branch] gh/ysiraichi/88/orig -> origin/gh/ysiraichi/88/orig 2025-09-07T06:13:53.2311212Z * [new branch] gh/zhxchen17/25/base -> origin/gh/zhxchen17/25/base 2025-09-07T06:13:53.2312266Z * [new branch] gh/zhxchen17/25/head -> origin/gh/zhxchen17/25/head 2025-09-07T06:13:53.2313528Z * [new branch] gh/zhxchen17/25/orig -> origin/gh/zhxchen17/25/orig 2025-09-07T06:13:53.2315349Z * [new branch] gh/zhxchen17/31/base -> origin/gh/zhxchen17/31/base 2025-09-07T06:13:53.2316463Z * [new branch] gh/zhxchen17/31/head -> origin/gh/zhxchen17/31/head 2025-09-07T06:13:53.2317636Z * [new branch] gh/zhxchen17/31/orig -> origin/gh/zhxchen17/31/orig 2025-09-07T06:13:53.2319296Z * [new branch] gh/zhxchen17/34/base -> origin/gh/zhxchen17/34/base 2025-09-07T06:13:53.2320356Z * [new branch] gh/zhxchen17/34/head -> origin/gh/zhxchen17/34/head 2025-09-07T06:13:53.2321793Z * [new branch] gh/zhxchen17/35/base -> origin/gh/zhxchen17/35/base 2025-09-07T06:13:53.2322786Z * [new branch] gh/zhxchen17/35/head -> origin/gh/zhxchen17/35/head 2025-09-07T06:13:53.2324734Z * [new branch] gh/zhxchen17/37/base -> origin/gh/zhxchen17/37/base 2025-09-07T06:13:53.2325803Z * [new branch] gh/zhxchen17/37/head -> origin/gh/zhxchen17/37/head 2025-09-07T06:13:53.2327003Z * [new branch] gh/zhxchen17/37/orig -> origin/gh/zhxchen17/37/orig 2025-09-07T06:13:53.2328786Z * [new branch] gh/zhxchen17/38/base -> origin/gh/zhxchen17/38/base 2025-09-07T06:13:53.2329806Z * [new branch] gh/zhxchen17/38/head -> origin/gh/zhxchen17/38/head 2025-09-07T06:13:53.2331013Z * [new branch] gh/zhxchen17/38/orig -> origin/gh/zhxchen17/38/orig 2025-09-07T06:13:53.2332802Z * [new branch] gh/zhxchen17/39/base -> origin/gh/zhxchen17/39/base 2025-09-07T06:13:53.2333972Z * [new branch] gh/zhxchen17/39/head -> origin/gh/zhxchen17/39/head 2025-09-07T06:13:53.2335183Z * [new branch] gh/zhxchen17/39/orig -> origin/gh/zhxchen17/39/orig 2025-09-07T06:13:53.2336963Z * [new branch] gh/zhxchen17/40/base -> origin/gh/zhxchen17/40/base 2025-09-07T06:13:53.2338148Z * [new branch] gh/zhxchen17/40/head -> origin/gh/zhxchen17/40/head 2025-09-07T06:13:53.2339419Z * [new branch] gh/zhxchen17/40/orig -> origin/gh/zhxchen17/40/orig 2025-09-07T06:13:53.2341269Z * [new branch] gh/zhxchen17/41/base -> origin/gh/zhxchen17/41/base 2025-09-07T06:13:53.2342476Z * [new branch] gh/zhxchen17/41/head -> origin/gh/zhxchen17/41/head 2025-09-07T06:13:53.2344105Z * [new branch] gh/zhxchen17/41/orig -> origin/gh/zhxchen17/41/orig 2025-09-07T06:13:53.2345856Z * [new branch] gh/zhxchen17/42/base -> origin/gh/zhxchen17/42/base 2025-09-07T06:13:53.2347076Z * [new branch] gh/zhxchen17/42/head -> origin/gh/zhxchen17/42/head 2025-09-07T06:13:53.2348451Z * [new branch] gh/zhxchen17/42/orig -> origin/gh/zhxchen17/42/orig 2025-09-07T06:13:53.2354011Z * [new branch] gh/zhxchen17/43/base -> origin/gh/zhxchen17/43/base 2025-09-07T06:13:53.2355294Z * [new branch] gh/zhxchen17/43/head -> origin/gh/zhxchen17/43/head 2025-09-07T06:13:53.2356540Z * [new branch] gh/zhxchen17/43/orig -> origin/gh/zhxchen17/43/orig 2025-09-07T06:13:53.2358362Z * [new branch] gh/zhxchen17/44/base -> origin/gh/zhxchen17/44/base 2025-09-07T06:13:53.2359400Z * [new branch] gh/zhxchen17/44/head -> origin/gh/zhxchen17/44/head 2025-09-07T06:13:53.2360624Z * [new branch] gh/zhxchen17/44/orig -> origin/gh/zhxchen17/44/orig 2025-09-07T06:13:53.2362460Z * [new branch] gh/zhxchen17/45/base -> origin/gh/zhxchen17/45/base 2025-09-07T06:13:53.2363615Z * [new branch] gh/zhxchen17/45/head -> origin/gh/zhxchen17/45/head 2025-09-07T06:13:53.2364884Z * [new branch] gh/zhxchen17/45/orig -> origin/gh/zhxchen17/45/orig 2025-09-07T06:13:53.2366805Z * [new branch] gh/zklaus/10/base -> origin/gh/zklaus/10/base 2025-09-07T06:13:53.2367898Z * [new branch] gh/zklaus/10/head -> origin/gh/zklaus/10/head 2025-09-07T06:13:53.2369022Z * [new branch] gh/zklaus/10/orig -> origin/gh/zklaus/10/orig 2025-09-07T06:13:53.2370594Z * [new branch] gh/zklaus/11/base -> origin/gh/zklaus/11/base 2025-09-07T06:13:53.2371909Z * [new branch] gh/zklaus/11/head -> origin/gh/zklaus/11/head 2025-09-07T06:13:53.2373174Z * [new branch] gh/zklaus/11/orig -> origin/gh/zklaus/11/orig 2025-09-07T06:13:53.2374795Z * [new branch] gh/zklaus/12/base -> origin/gh/zklaus/12/base 2025-09-07T06:13:53.2375877Z * [new branch] gh/zklaus/12/head -> origin/gh/zklaus/12/head 2025-09-07T06:13:53.2376990Z * [new branch] gh/zklaus/12/orig -> origin/gh/zklaus/12/orig 2025-09-07T06:13:53.2378792Z * [new branch] gh/zklaus/14/base -> origin/gh/zklaus/14/base 2025-09-07T06:13:53.2379913Z * [new branch] gh/zklaus/14/head -> origin/gh/zklaus/14/head 2025-09-07T06:13:53.2381057Z * [new branch] gh/zklaus/14/orig -> origin/gh/zklaus/14/orig 2025-09-07T06:13:53.2382713Z * [new branch] gh/zklaus/15/base -> origin/gh/zklaus/15/base 2025-09-07T06:13:53.2383861Z * [new branch] gh/zklaus/15/head -> origin/gh/zklaus/15/head 2025-09-07T06:13:53.2385186Z * [new branch] gh/zklaus/15/orig -> origin/gh/zklaus/15/orig 2025-09-07T06:13:53.2386656Z * [new branch] gh/zklaus/16/base -> origin/gh/zklaus/16/base 2025-09-07T06:13:53.2388378Z * [new branch] gh/zklaus/16/head -> origin/gh/zklaus/16/head 2025-09-07T06:13:53.2389502Z * [new branch] gh/zklaus/16/orig -> origin/gh/zklaus/16/orig 2025-09-07T06:13:53.2391105Z * [new branch] gh/zklaus/17/base -> origin/gh/zklaus/17/base 2025-09-07T06:13:53.2392170Z * [new branch] gh/zklaus/17/head -> origin/gh/zklaus/17/head 2025-09-07T06:13:53.2393349Z * [new branch] gh/zklaus/17/orig -> origin/gh/zklaus/17/orig 2025-09-07T06:13:53.2394932Z * [new branch] gh/zklaus/18/base -> origin/gh/zklaus/18/base 2025-09-07T06:13:53.2396058Z * [new branch] gh/zklaus/18/head -> origin/gh/zklaus/18/head 2025-09-07T06:13:53.2397196Z * [new branch] gh/zklaus/18/orig -> origin/gh/zklaus/18/orig 2025-09-07T06:13:53.2398737Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-09-07T06:13:53.2399857Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-09-07T06:13:53.2400995Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-09-07T06:13:53.2402553Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-09-07T06:13:53.2403609Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-09-07T06:13:53.2404739Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-09-07T06:13:53.2406331Z * [new branch] gh/zklaus/7/base -> origin/gh/zklaus/7/base 2025-09-07T06:13:53.2407370Z * [new branch] gh/zklaus/7/head -> origin/gh/zklaus/7/head 2025-09-07T06:13:53.2408473Z * [new branch] gh/zklaus/7/orig -> origin/gh/zklaus/7/orig 2025-09-07T06:13:53.2410078Z * [new branch] gh/zklaus/9/base -> origin/gh/zklaus/9/base 2025-09-07T06:13:53.2411142Z * [new branch] gh/zklaus/9/head -> origin/gh/zklaus/9/head 2025-09-07T06:13:53.2412632Z * [new branch] gh/zklaus/9/orig -> origin/gh/zklaus/9/orig 2025-09-07T06:13:53.2414547Z * [new branch] gh/zou3519/1175/base -> origin/gh/zou3519/1175/base 2025-09-07T06:13:53.2415631Z * [new branch] gh/zou3519/1175/head -> origin/gh/zou3519/1175/head 2025-09-07T06:13:53.2416820Z * [new branch] gh/zou3519/1175/orig -> origin/gh/zou3519/1175/orig 2025-09-07T06:13:53.2418534Z * [new branch] gh/zou3519/1177/base -> origin/gh/zou3519/1177/base 2025-09-07T06:13:53.2419651Z * [new branch] gh/zou3519/1177/head -> origin/gh/zou3519/1177/head 2025-09-07T06:13:53.2420828Z * [new branch] gh/zou3519/1177/orig -> origin/gh/zou3519/1177/orig 2025-09-07T06:13:53.2422532Z * [new branch] gh/zou3519/1191/base -> origin/gh/zou3519/1191/base 2025-09-07T06:13:53.2423839Z * [new branch] gh/zou3519/1191/head -> origin/gh/zou3519/1191/head 2025-09-07T06:13:53.2425217Z * [new branch] gh/zou3519/1191/orig -> origin/gh/zou3519/1191/orig 2025-09-07T06:13:53.2426802Z * [new branch] gh/zou3519/1192/base -> origin/gh/zou3519/1192/base 2025-09-07T06:13:53.2427906Z * [new branch] gh/zou3519/1192/head -> origin/gh/zou3519/1192/head 2025-09-07T06:13:53.2429064Z * [new branch] gh/zou3519/1192/orig -> origin/gh/zou3519/1192/orig 2025-09-07T06:13:53.2430481Z * [new branch] gh/zou3519/1193/base -> origin/gh/zou3519/1193/base 2025-09-07T06:13:53.2431620Z * [new branch] gh/zou3519/1193/head -> origin/gh/zou3519/1193/head 2025-09-07T06:13:53.2432707Z * [new branch] gh/zou3519/1193/orig -> origin/gh/zou3519/1193/orig 2025-09-07T06:13:53.2434114Z * [new branch] gh/zou3519/1194/base -> origin/gh/zou3519/1194/base 2025-09-07T06:13:53.2435258Z * [new branch] gh/zou3519/1194/head -> origin/gh/zou3519/1194/head 2025-09-07T06:13:53.2436402Z * [new branch] gh/zou3519/1194/orig -> origin/gh/zou3519/1194/orig 2025-09-07T06:13:53.2437996Z * [new branch] gh/zou3519/1195/base -> origin/gh/zou3519/1195/base 2025-09-07T06:13:53.2439206Z * [new branch] gh/zou3519/1195/head -> origin/gh/zou3519/1195/head 2025-09-07T06:13:53.2440559Z * [new branch] gh/zou3519/1195/orig -> origin/gh/zou3519/1195/orig 2025-09-07T06:13:53.2441945Z * [new branch] gh/zou3519/1196/base -> origin/gh/zou3519/1196/base 2025-09-07T06:13:53.2443043Z * [new branch] gh/zou3519/1196/head -> origin/gh/zou3519/1196/head 2025-09-07T06:13:53.2444166Z * [new branch] gh/zou3519/1196/orig -> origin/gh/zou3519/1196/orig 2025-09-07T06:13:53.2445685Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-09-07T06:13:53.2446779Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-09-07T06:13:53.2447929Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-09-07T06:13:53.2450476Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-09-07T06:13:53.2451565Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-09-07T06:13:53.2453372Z * [new branch] gh/zpcore/10/base -> origin/gh/zpcore/10/base 2025-09-07T06:13:53.2454459Z * [new branch] gh/zpcore/10/head -> origin/gh/zpcore/10/head 2025-09-07T06:13:53.2455839Z * [new branch] gh/zpcore/10/orig -> origin/gh/zpcore/10/orig 2025-09-07T06:13:53.2457938Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-09-07T06:13:53.2459152Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-09-07T06:13:53.2460314Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-09-07T06:13:53.2462183Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-09-07T06:13:53.2463666Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-09-07T06:13:53.2464833Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-09-07T06:13:53.2466504Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-09-07T06:13:53.2467604Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-09-07T06:13:53.2468711Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-09-07T06:13:53.2470339Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-09-07T06:13:53.2471452Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-09-07T06:13:53.2473159Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-09-07T06:13:53.2474286Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-09-07T06:13:53.2476172Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-09-07T06:13:53.2477155Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-09-07T06:13:53.2478592Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-09-07T06:13:53.2479612Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-09-07T06:13:53.2481158Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-09-07T06:13:53.2482096Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-09-07T06:13:53.2483519Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-09-07T06:13:53.2484501Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-09-07T06:13:53.2486021Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-09-07T06:13:53.2487005Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-09-07T06:13:53.2488870Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-09-07T06:13:53.2489928Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-09-07T06:13:53.2491407Z * [new branch] google-main -> origin/google-main 2025-09-07T06:13:53.2493406Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-09-07T06:13:53.2494432Z * [new branch] guangyey/host_alloc -> origin/guangyey/host_alloc 2025-09-07T06:13:53.2495501Z * [new branch] guangyey/reimport -> origin/guangyey/reimport 2025-09-07T06:13:53.2496703Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-09-07T06:13:53.2498612Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-09-07T06:13:53.2500012Z * [new branch] haozhe/bf16-dynamic-shape -> origin/haozhe/bf16-dynamic-shape 2025-09-07T06:13:53.2501169Z * [new branch] hc_baseline -> origin/hc_baseline 2025-09-07T06:13:53.2502543Z * [new branch] hf_update -> origin/hf_update 2025-09-07T06:13:53.2503828Z * [new branch] hhh_decomp_mul -> origin/hhh_decomp_mul 2025-09-07T06:13:53.2504905Z * [new branch] hhh_rand -> origin/hhh_rand 2025-09-07T06:13:53.2506493Z * [new branch] hoy/mmsplitk -> origin/hoy/mmsplitk 2025-09-07T06:13:53.2507627Z * [new branch] hoy/triton-PR3973 -> origin/hoy/triton-PR3973 2025-09-07T06:13:53.2508831Z * [new branch] hoy/triton-coalescing-baseline -> origin/hoy/triton-coalescing-baseline 2025-09-07T06:13:53.2509883Z * [new branch] hoy/triton-coalescing-new -> origin/hoy/triton-coalescing-new 2025-09-07T06:13:53.2510923Z * [new branch] hoy/triton-coalescing-vec -> origin/hoy/triton-coalescing-vec 2025-09-07T06:13:53.2512037Z * [new branch] inductordecompfix -> origin/inductordecompfix 2025-09-07T06:13:53.2513663Z * [new branch] inline -> origin/inline 2025-09-07T06:13:53.2514877Z * [new branch] inlining -> origin/inlining 2025-09-07T06:13:53.2516038Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-09-07T06:13:53.2517240Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-09-07T06:13:53.2518332Z * [new branch] int8_sdpa -> origin/int8_sdpa 2025-09-07T06:13:53.2519549Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-09-07T06:13:53.2520840Z * [new branch] issue#58739 -> origin/issue#58739 2025-09-07T06:13:53.2522793Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-09-07T06:13:53.2524312Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-09-07T06:13:53.2525900Z * [new branch] jeanschmidt/disable_rocm_build_tests -> origin/jeanschmidt/disable_rocm_build_tests 2025-09-07T06:13:53.2527195Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-09-07T06:13:53.2528340Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-09-07T06:13:53.2529895Z * [new branch] justinchu/attention-tests -> origin/justinchu/attention-tests 2025-09-07T06:13:53.2530984Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-09-07T06:13:53.2532451Z * [new branch] justinchu/ort-122 -> origin/justinchu/ort-122 2025-09-07T06:13:53.2534173Z * [new branch] justinchuby/dynamo-true -> origin/justinchuby/dynamo-true 2025-09-07T06:13:53.2535690Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-09-07T06:13:53.2536975Z * [new branch] kainan_test -> origin/kainan_test 2025-09-07T06:13:53.2538175Z * [new branch] learnablebias -> origin/learnablebias 2025-09-07T06:13:53.2539858Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-09-07T06:13:53.2541417Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-09-07T06:13:53.2542905Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-09-07T06:13:53.2544142Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-09-07T06:13:53.2545068Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-09-07T06:13:53.2546245Z * [new branch] lintbuilddocker -> origin/lintbuilddocker 2025-09-07T06:13:53.2547253Z * [new branch] llama4-stable -> origin/llama4-stable 2025-09-07T06:13:53.2548525Z * [new branch] logdetfix -> origin/logdetfix 2025-09-07T06:13:53.2551043Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-09-07T06:13:53.2552717Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-09-07T06:13:53.2553887Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-09-07T06:13:53.2555002Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-09-07T06:13:53.2556127Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-09-07T06:13:53.2557368Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-09-07T06:13:53.2558222Z * [new branch] lucaskabela/issue_120648 -> origin/lucaskabela/issue_120648 2025-09-07T06:13:53.2559789Z * [new branch] lucaskabela/misc_typing_dynamo -> origin/lucaskabela/misc_typing_dynamo 2025-09-07T06:13:53.2561553Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-09-07T06:13:53.2562824Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-09-07T06:13:53.2563756Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-09-07T06:13:53.2564944Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-09-07T06:13:53.2566171Z * [new branch] lucaskabela/typing_symbolic_convert -> origin/lucaskabela/typing_symbolic_convert 2025-09-07T06:13:53.2567345Z * [new branch] lucaskabela/typing_utils_improvements -> origin/lucaskabela/typing_utils_improvements 2025-09-07T06:13:53.2568474Z * [new branch] main -> origin/main 2025-09-07T06:13:53.2569913Z * [new branch] main-enable-b200-distributed-tests -> origin/main-enable-b200-distributed-tests 2025-09-07T06:13:53.2571242Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-09-07T06:13:53.2572850Z * [new branch] malfet-patch-12 -> origin/malfet-patch-12 2025-09-07T06:13:53.2574116Z * [new branch] malfet-patch-14 -> origin/malfet-patch-14 2025-09-07T06:13:53.2575456Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-09-07T06:13:53.2576779Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-09-07T06:13:53.2578779Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-09-07T06:13:53.2579580Z * [new branch] malfet/delete-upsteam-cuda -> origin/malfet/delete-upsteam-cuda 2025-09-07T06:13:53.2580730Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-09-07T06:13:53.2582357Z * [new branch] manuel/test-ops-common-allow-mps -> origin/manuel/test-ops-common-allow-mps 2025-09-07T06:13:53.2583489Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-09-07T06:13:53.2585213Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-09-07T06:13:53.2586267Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-09-07T06:13:53.2587342Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-09-07T06:13:53.2588571Z * [new branch] mlazos/backup-test-branch -> origin/mlazos/backup-test-branch 2025-09-07T06:13:53.2589622Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-09-07T06:13:53.2590912Z * [new branch] mlazos/baseline -> origin/mlazos/baseline 2025-09-07T06:13:53.2592090Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-09-07T06:13:53.2593142Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-09-07T06:13:53.2594441Z * [new branch] mlazos/better-msg -> origin/mlazos/better-msg 2025-09-07T06:13:53.2595891Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-09-07T06:13:53.2596919Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-09-07T06:13:53.2598118Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-09-07T06:13:53.2599734Z * [new branch] mlazos/ck2 -> origin/mlazos/ck2 2025-09-07T06:13:53.2601435Z * [new branch] mlazos/combokernels -> origin/mlazos/combokernels 2025-09-07T06:13:53.2602561Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-09-07T06:13:53.2603575Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-09-07T06:13:53.2604815Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-09-07T06:13:53.2605981Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-09-07T06:13:53.2607170Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-09-07T06:13:53.2608469Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-09-07T06:13:53.2609566Z * [new branch] mlazos/data-gather -> origin/mlazos/data-gather 2025-09-07T06:13:53.2610698Z * [new branch] mlazos/data-ptrs2 -> origin/mlazos/data-ptrs2 2025-09-07T06:13:53.2612057Z * [new branch] mlazos/data-ptrs3 -> origin/mlazos/data-ptrs3 2025-09-07T06:13:53.2613296Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-09-07T06:13:53.2614430Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-09-07T06:13:53.2615639Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-09-07T06:13:53.2616653Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-09-07T06:13:53.2617865Z * [new branch] mlazos/disable-closures -> origin/mlazos/disable-closures 2025-09-07T06:13:53.2619000Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-09-07T06:13:53.2620023Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-09-07T06:13:53.2621377Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-09-07T06:13:53.2622486Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-09-07T06:13:53.2623883Z * [new branch] mlazos/exp_disable -> origin/mlazos/exp_disable 2025-09-07T06:13:53.2625048Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-09-07T06:13:53.2626146Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-09-07T06:13:53.2627348Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-09-07T06:13:53.2628498Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-09-07T06:13:53.2629671Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-09-07T06:13:53.2630687Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-09-07T06:13:53.2632207Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-09-07T06:13:53.2633341Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-09-07T06:13:53.2634551Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-09-07T06:13:53.2635726Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-09-07T06:13:53.2636887Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-09-07T06:13:53.2638055Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-09-07T06:13:53.2639171Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-09-07T06:13:53.2640332Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-09-07T06:13:53.2641402Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-09-07T06:13:53.2642535Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-09-07T06:13:53.2643716Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-09-07T06:13:53.2644870Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-09-07T06:13:53.2646085Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-09-07T06:13:53.2647203Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-09-07T06:13:53.2648330Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-09-07T06:13:53.2649860Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-09-07T06:13:53.2651111Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-09-07T06:13:53.2652630Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-09-07T06:13:53.2653744Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-09-07T06:13:53.2654912Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-09-07T06:13:53.2656059Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-09-07T06:13:53.2657203Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-09-07T06:13:53.2658229Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-09-07T06:13:53.2659506Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-09-07T06:13:53.2660655Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-09-07T06:13:53.2661820Z * [new branch] mlazos/init-per-param -> origin/mlazos/init-per-param 2025-09-07T06:13:53.2662932Z * [new branch] mlazos/init_per_param -> origin/mlazos/init_per_param 2025-09-07T06:13:53.2664155Z * [new branch] mlazos/less-guards -> origin/mlazos/less-guards 2025-09-07T06:13:53.2665427Z * [new branch] mlazos/lr-composibility -> origin/mlazos/lr-composibility 2025-09-07T06:13:53.2666287Z * [new branch] mlazos/main -> origin/mlazos/main 2025-09-07T06:13:53.2667651Z * [new branch] mlazos/main-test-enablement -> origin/mlazos/main-test-enablement 2025-09-07T06:13:53.2668672Z * [new branch] mlazos/main2 -> origin/mlazos/main2 2025-09-07T06:13:53.2669913Z * [new branch] mlazos/mark-static-update -> origin/mlazos/mark-static-update 2025-09-07T06:13:53.2670939Z * [new branch] mlazos/mcg -> origin/mlazos/mcg 2025-09-07T06:13:53.2672056Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-09-07T06:13:53.2673220Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-09-07T06:13:53.2674634Z * [new branch] mlazos/mlazos/ck2 -> origin/mlazos/mlazos/ck2 2025-09-07T06:13:53.2675911Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-09-07T06:13:53.2677087Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-09-07T06:13:53.2678141Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-09-07T06:13:53.2679394Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-09-07T06:13:53.2680953Z * [new branch] mlazos/more-tests -> origin/mlazos/more-tests 2025-09-07T06:13:53.2682057Z * [new branch] mlazos/no-cpp -> origin/mlazos/no-cpp 2025-09-07T06:13:53.2683454Z * [new branch] mlazos/no-init-group-handling -> origin/mlazos/no-init-group-handling 2025-09-07T06:13:53.2684491Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-09-07T06:13:53.2685647Z * [new branch] mlazos/opt-bench-exp2 -> origin/mlazos/opt-bench-exp2 2025-09-07T06:13:53.2686747Z * [new branch] mlazos/opt-incr -> origin/mlazos/opt-incr 2025-09-07T06:13:53.2687885Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-09-07T06:13:53.2688999Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-09-07T06:13:53.2690154Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-09-07T06:13:53.2691409Z * [new branch] mlazos/revert-inline -> origin/mlazos/revert-inline 2025-09-07T06:13:53.2692822Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-09-07T06:13:53.2693841Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-09-07T06:13:53.2695002Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-09-07T06:13:53.2696178Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-09-07T06:13:53.2697401Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-09-07T06:13:53.2698575Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-09-07T06:13:53.2699755Z * [new branch] mlazos/sub-param-fix -> origin/mlazos/sub-param-fix 2025-09-07T06:13:53.2700933Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-09-07T06:13:53.2702295Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-09-07T06:13:53.2703328Z * [new branch] mlazos/test -> origin/mlazos/test 2025-09-07T06:13:53.2704601Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-09-07T06:13:53.2705849Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-09-07T06:13:53.2706972Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-09-07T06:13:53.2708276Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-09-07T06:13:53.2709391Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-09-07T06:13:53.2710524Z * [new branch] mlazos/topo-fix -> origin/mlazos/topo-fix 2025-09-07T06:13:53.2712104Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-09-07T06:13:53.2713358Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-09-07T06:13:53.2714546Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-09-07T06:13:53.2715512Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-09-07T06:13:53.2716703Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-09-07T06:13:53.2717823Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-09-07T06:13:53.2718964Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-09-07T06:13:53.2720155Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-09-07T06:13:53.2721370Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-09-07T06:13:53.2722612Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-09-07T06:13:53.2723804Z * [new branch] modify-setupvllm -> origin/modify-setupvllm 2025-09-07T06:13:53.2725029Z * [new branch] module-shim -> origin/module-shim 2025-09-07T06:13:53.2726299Z * [new branch] move-theme-out-docker -> origin/move-theme-out-docker 2025-09-07T06:13:53.2727905Z * [new branch] msaroufim/be1 -> origin/msaroufim/be1 2025-09-07T06:13:53.2729094Z * [new branch] msaroufim/cn_path -> origin/msaroufim/cn_path 2025-09-07T06:13:53.2730276Z * [new branch] msaroufim/dtensorfusedadam -> origin/msaroufim/dtensorfusedadam 2025-09-07T06:13:53.2731466Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-09-07T06:13:53.2733291Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-09-07T06:13:53.2734950Z * [new branch] muon_dev -> origin/muon_dev 2025-09-07T06:13:53.2736206Z * [new branch] muon_dev_1 -> origin/muon_dev_1 2025-09-07T06:13:53.2737498Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-09-07T06:13:53.2738841Z * [new branch] nativert_numoutputs -> origin/nativert_numoutputs 2025-09-07T06:13:53.2740178Z * [new branch] new-modifiy-setupvllm -> origin/new-modifiy-setupvllm 2025-09-07T06:13:53.2741347Z * [new branch] new-setupvllm -> origin/new-setupvllm 2025-09-07T06:13:53.2742567Z * [new branch] new_zeros_dtype -> origin/new_zeros_dtype 2025-09-07T06:13:53.2743990Z * [new branch] newtest-base -> origin/newtest-base 2025-09-07T06:13:53.2745562Z * [new branch] ngimel/cat_perf1 -> origin/ngimel/cat_perf1 2025-09-07T06:13:53.2746748Z * [new branch] ngimel/einsum_fix -> origin/ngimel/einsum_fix 2025-09-07T06:13:53.2747768Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-09-07T06:13:53.2748930Z * [new branch] ngimel/fabric_check -> origin/ngimel/fabric_check 2025-09-07T06:13:53.2751504Z * [new branch] ngimel/fabric_fix -> origin/ngimel/fabric_fix 2025-09-07T06:13:53.2752730Z * [new branch] ngimel/fix_driver_init_error -> origin/ngimel/fix_driver_init_error 2025-09-07T06:13:53.2754514Z * [new branch] ngimel/fix_nccl_segment_seg -> origin/ngimel/fix_nccl_segment_seg 2025-09-07T06:13:53.2755775Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-09-07T06:13:53.2757190Z * [new branch] ngimel/modeguard -> origin/ngimel/modeguard 2025-09-07T06:13:53.2758671Z * [new branch] ngimel/multicast_fix -> origin/ngimel/multicast_fix 2025-09-07T06:13:53.2759896Z * [new branch] ngimel/rocm_handle_type -> origin/ngimel/rocm_handle_type 2025-09-07T06:13:53.2761305Z * [new branch] ngimel/symm_handle_fabric -> origin/ngimel/symm_handle_fabric 2025-09-07T06:13:53.2762401Z * [new branch] ngimel/unbind_multimem -> origin/ngimel/unbind_multimem 2025-09-07T06:13:53.2763604Z * [new branch] nightly -> origin/nightly 2025-09-07T06:13:53.2764931Z * [new branch] nmacchioni-patch-10 -> origin/nmacchioni-patch-10 2025-09-07T06:13:53.2766214Z * [new branch] nmacchioni-patch-7 -> origin/nmacchioni-patch-7 2025-09-07T06:13:53.2767578Z * [new branch] nmacchioni-patch-8 -> origin/nmacchioni-patch-8 2025-09-07T06:13:53.2768845Z * [new branch] nmacchioni-patch-9 -> origin/nmacchioni-patch-9 2025-09-07T06:13:53.2770373Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-09-07T06:13:53.2771660Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-09-07T06:13:53.2773067Z * [new branch] one-off -> origin/one-off 2025-09-07T06:13:53.2775100Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-09-07T06:13:53.2776343Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-09-07T06:13:53.2777547Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-09-07T06:13:53.2779060Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-09-07T06:13:53.2780315Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-09-07T06:13:53.2781847Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-09-07T06:13:53.2783116Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-09-07T06:13:53.2784426Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-09-07T06:13:53.2785570Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-09-07T06:13:53.2786731Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-09-07T06:13:53.2787980Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-09-07T06:13:53.2789120Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-09-07T06:13:53.2790373Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-09-07T06:13:53.2791468Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-09-07T06:13:53.2792590Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-09-07T06:13:53.2794034Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-09-07T06:13:53.2795742Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-09-07T06:13:53.2797264Z * [new branch] oulgen/fx_graph -> origin/oulgen/fx_graph 2025-09-07T06:13:53.2798572Z * [new branch] padded-tensor -> origin/padded-tensor 2025-09-07T06:13:53.2799839Z * [new branch] pca2 -> origin/pca2 2025-09-07T06:13:53.2801182Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-09-07T06:13:53.2802854Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-09-07T06:13:53.2803806Z * [new branch] pianpwk/invalidate_fake_memo -> origin/pianpwk/invalidate_fake_memo 2025-09-07T06:13:53.2804898Z * [new branch] pianpwk/max_1_strides -> origin/pianpwk/max_1_strides 2025-09-07T06:13:53.2805942Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-09-07T06:13:53.2807011Z * [new branch] pianpwk/nonzero_memo -> origin/pianpwk/nonzero_memo 2025-09-07T06:13:53.2808435Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-09-07T06:13:53.2809845Z * [new branch] pianpwk/oblivious_slice_forward -> origin/pianpwk/oblivious_slice_forward 2025-09-07T06:13:53.2810952Z * [new branch] pianpwk/oblivious_where -> origin/pianpwk/oblivious_where 2025-09-07T06:13:53.2812417Z * [new branch] pianpwk/param_static_pgo -> origin/pianpwk/param_static_pgo 2025-09-07T06:13:53.2813551Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-09-07T06:13:53.2814818Z * [new branch] pianpwk/remove_guard_fail_break -> origin/pianpwk/remove_guard_fail_break 2025-09-07T06:13:53.2815905Z * [new branch] pianpwk/slice_fresh_symbols -> origin/pianpwk/slice_fresh_symbols 2025-09-07T06:13:53.2817115Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-09-07T06:13:53.2818506Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-09-07T06:13:53.2819478Z * [new branch] pianpwk/test_slice_fake_impl -> origin/pianpwk/test_slice_fake_impl 2025-09-07T06:13:53.2820768Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-09-07T06:13:53.2821896Z * [new branch] pianpwk/unbacked_channels_last -> origin/pianpwk/unbacked_channels_last 2025-09-07T06:13:53.2823092Z * [new branch] pianpwk/unbacked_safe_conv1d -> origin/pianpwk/unbacked_safe_conv1d 2025-09-07T06:13:53.2824294Z * [new branch] pianpwk/unbacked_sdpa_flash -> origin/pianpwk/unbacked_sdpa_flash 2025-09-07T06:13:53.2825479Z * [new branch] pianpwk/unbacked_should_swap -> origin/pianpwk/unbacked_should_swap 2025-09-07T06:13:53.2826672Z * [new branch] pianpwk/unbacked_should_swap_2 -> origin/pianpwk/unbacked_should_swap_2 2025-09-07T06:13:53.2827797Z * [new branch] pianpwk/unbacked_slice_binding -> origin/pianpwk/unbacked_slice_binding 2025-09-07T06:13:53.2828894Z * [new branch] pianpwk/unbacked_slice_forward -> origin/pianpwk/unbacked_slice_forward 2025-09-07T06:13:53.2829912Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-09-07T06:13:53.2831007Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-09-07T06:13:53.2832107Z * [new branch] pianpwk/whitelist_optimizer -> origin/pianpwk/whitelist_optimizer 2025-09-07T06:13:53.2833310Z * [new branch] pin-torchao -> origin/pin-torchao 2025-09-07T06:13:53.2835106Z * [new branch] piz/fall_back_missing_0716 -> origin/piz/fall_back_missing_0716 2025-09-07T06:13:53.2836222Z * [new branch] piz/improve_scatter_0808 -> origin/piz/improve_scatter_0808 2025-09-07T06:13:53.2837303Z * [new branch] pool-separate -> origin/pool-separate 2025-09-07T06:13:53.2838506Z * [new branch] pr-156087 -> origin/pr-156087 2025-09-07T06:13:53.2840176Z * [new branch] pr/131860 -> origin/pr/131860 2025-09-07T06:13:53.2841453Z * [new branch] predispatch_to -> origin/predispatch_to 2025-09-07T06:13:53.2842744Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-09-07T06:13:53.2844900Z * [new branch] pyobjectslot -> origin/pyobjectslot 2025-09-07T06:13:53.2846531Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-09-07T06:13:53.2848017Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-09-07T06:13:53.2849338Z * [new branch] quint-bits -> origin/quint-bits 2025-09-07T06:13:53.2851170Z * [new branch] release/1.10 -> origin/release/1.10 2025-09-07T06:13:53.2852592Z * [new branch] release/1.11 -> origin/release/1.11 2025-09-07T06:13:53.2853991Z * [new branch] release/1.12 -> origin/release/1.12 2025-09-07T06:13:53.2855242Z * [new branch] release/1.13 -> origin/release/1.13 2025-09-07T06:13:53.2856336Z * [new branch] release/1.4 -> origin/release/1.4 2025-09-07T06:13:53.2857313Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-09-07T06:13:53.2858520Z * [new branch] release/1.5 -> origin/release/1.5 2025-09-07T06:13:53.2859765Z * [new branch] release/1.6 -> origin/release/1.6 2025-09-07T06:13:53.2861019Z * [new branch] release/1.7 -> origin/release/1.7 2025-09-07T06:13:53.2862352Z * [new branch] release/1.8 -> origin/release/1.8 2025-09-07T06:13:53.2863623Z * [new branch] release/1.9 -> origin/release/1.9 2025-09-07T06:13:53.2864849Z * [new branch] release/2.0 -> origin/release/2.0 2025-09-07T06:13:53.2866089Z * [new branch] release/2.1 -> origin/release/2.1 2025-09-07T06:13:53.2867258Z * [new branch] release/2.2 -> origin/release/2.2 2025-09-07T06:13:53.2868761Z * [new branch] release/2.3 -> origin/release/2.3 2025-09-07T06:13:53.2870341Z * [new branch] release/2.4 -> origin/release/2.4 2025-09-07T06:13:53.2871906Z * [new branch] release/2.5 -> origin/release/2.5 2025-09-07T06:13:53.2873168Z * [new branch] release/2.6 -> origin/release/2.6 2025-09-07T06:13:53.2875004Z * [new branch] release/2.7 -> origin/release/2.7 2025-09-07T06:13:53.2876232Z * [new branch] release/2.8 -> origin/release/2.8 2025-09-07T06:13:53.2877490Z * [new branch] release_notes -> origin/release_notes 2025-09-07T06:13:53.2878782Z * [new branch] remove-actionable-label -> origin/remove-actionable-label 2025-09-07T06:13:53.2879909Z * [new branch] remove-ao -> origin/remove-ao 2025-09-07T06:13:53.2881430Z * [new branch] removedeprecatedvllmtest -> origin/removedeprecatedvllmtest 2025-09-07T06:13:53.2882671Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-09-07T06:13:53.2883658Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-09-07T06:13:53.2884704Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-09-07T06:13:53.2886037Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-09-07T06:13:53.2887168Z * [new branch] replace-pytorch-labs-20250812-204125 -> origin/replace-pytorch-labs-20250812-204125 2025-09-07T06:13:53.2888324Z * [new branch] replace-pytorch-labs-20250812-205624 -> origin/replace-pytorch-labs-20250812-205624 2025-09-07T06:13:53.2891028Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-09-07T06:13:53.2893716Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-09-07T06:13:53.2895949Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-09-07T06:13:53.2897583Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-09-07T06:13:53.2898501Z * [new branch] rocm-monitoring -> origin/rocm-monitoring 2025-09-07T06:13:53.2900154Z * [new branch] ruisi/relax_memory -> origin/ruisi/relax_memory 2025-09-07T06:13:53.2901917Z * [new branch] run-torchbench-smoke-test-h100 -> origin/run-torchbench-smoke-test-h100 2025-09-07T06:13:53.2903866Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-09-07T06:13:53.2904816Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-09-07T06:13:53.2906511Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-09-07T06:13:53.2907540Z * [new branch] rzou/njt -> origin/rzou/njt 2025-09-07T06:13:53.2908722Z * [new branch] rzou/pca -> origin/rzou/pca 2025-09-07T06:13:53.2909794Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-09-07T06:13:53.2910862Z * [new branch] rzou/setup_context -> origin/rzou/setup_context 2025-09-07T06:13:53.2912700Z * [new branch] sanchitintel/refactor_aten_int8_woq_gemm -> origin/sanchitintel/refactor_aten_int8_woq_gemm 2025-09-07T06:13:53.2914079Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-09-07T06:13:53.2915126Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-09-07T06:13:53.2916417Z * [new branch] save -> origin/save 2025-09-07T06:13:53.2918029Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-09-07T06:13:53.2919373Z * [new branch] seemethere-patch-1 -> origin/seemethere-patch-1 2025-09-07T06:13:53.2920574Z * [new branch] setupvllm -> origin/setupvllm 2025-09-07T06:13:53.2921802Z * [new branch] share_and_pin_fork -> origin/share_and_pin_fork 2025-09-07T06:13:53.2923439Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-09-07T06:13:53.2924635Z * [new branch] shikaili_fp8_allgather -> origin/shikaili_fp8_allgather 2025-09-07T06:13:53.2925938Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-09-07T06:13:53.2927106Z * [new branch] shoumikhin-patch-12 -> origin/shoumikhin-patch-12 2025-09-07T06:13:53.2928397Z * [new branch] simplify-fq-per-channel -> origin/simplify-fq-per-channel 2025-09-07T06:13:53.2929566Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-09-07T06:13:53.2931097Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-09-07T06:13:53.2933134Z * [new branch] sqzhang/flight4 -> origin/sqzhang/flight4 2025-09-07T06:13:53.2934274Z * [new branch] sqzhang/flight4plus -> origin/sqzhang/flight4plus 2025-09-07T06:13:53.2935941Z * [new branch] sraikund/record_funct_test -> origin/sraikund/record_funct_test 2025-09-07T06:13:53.2937521Z * [new branch] sraikund16/test -> origin/sraikund16/test 2025-09-07T06:13:53.2938945Z * [new branch] stablize-compilation-time -> origin/stablize-compilation-time 2025-09-07T06:13:53.2940142Z * [new branch] standalone-templates -> origin/standalone-templates 2025-09-07T06:13:53.2941461Z * [new branch] standalone_package_weights -> origin/standalone_package_weights 2025-09-07T06:13:53.2942692Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-09-07T06:13:53.2944002Z * [new branch] subgraph_fuse -> origin/subgraph_fuse 2025-09-07T06:13:53.2945409Z * [new branch] support-uv-in-collect_env -> origin/support-uv-in-collect_env 2025-09-07T06:13:53.2946478Z * [new branch] sve-poc -> origin/sve-poc 2025-09-07T06:13:53.2947717Z * [new branch] svekars-patch-1 -> origin/svekars-patch-1 2025-09-07T06:13:53.2949254Z * [new branch] switch-bn -> origin/switch-bn 2025-09-07T06:13:53.2950741Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-09-07T06:13:53.2952360Z * [new branch] tenpercent/ck_rocm_ci_v3 -> origin/tenpercent/ck_rocm_ci_v3 2025-09-07T06:13:53.2953632Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-09-07T06:13:53.2954813Z * [new branch] test-7054 -> origin/test-7054 2025-09-07T06:13:53.2956279Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-09-07T06:13:53.2957613Z * [new branch] test-myst-markdown-docstring -> origin/test-myst-markdown-docstring 2025-09-07T06:13:53.2958800Z * [new branch] test-old -> origin/test-old 2025-09-07T06:13:53.2960563Z * [new branch] test-vec-migration-internally -> origin/test-vec-migration-internally 2025-09-07T06:13:53.2962130Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-09-07T06:13:53.2963247Z * [new branch] test/inductor -> origin/test/inductor 2025-09-07T06:13:53.2964907Z * [new branch] tianren/flex_paged_attn_fix -> origin/tianren/flex_paged_attn_fix 2025-09-07T06:13:53.2966022Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-09-07T06:13:53.2967027Z * [new branch] tianren/test -> origin/tianren/test 2025-09-07T06:13:53.2968278Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-09-07T06:13:53.2969465Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-09-07T06:13:53.2970698Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-09-07T06:13:53.2972165Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-09-07T06:13:53.2973430Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-09-07T06:13:53.2974730Z * [new branch] tree_vec_base -> origin/tree_vec_base 2025-09-07T06:13:53.2976012Z * [new branch] triton-update -> origin/triton-update 2025-09-07T06:13:53.2977303Z * [new branch] triton_kernel -> origin/triton_kernel 2025-09-07T06:13:53.2978453Z * [new branch] triton_kernel_perf -> origin/triton_kernel_perf 2025-09-07T06:13:53.2979721Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-09-07T06:13:53.2981204Z * [new branch] tweak-transformer-dependabot -> origin/tweak-transformer-dependabot 2025-09-07T06:13:53.2982238Z * [new branch] type_dec -> origin/type_dec 2025-09-07T06:13:53.2983753Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-09-07T06:13:53.2985571Z * [new branch] update-audio-commit-hash/16818882925-1712-1 -> origin/update-audio-commit-hash/16818882925-1712-1 2025-09-07T06:13:53.2986618Z * [new branch] update-audio-commit-hash/16895560422-1720-1 -> origin/update-audio-commit-hash/16895560422-1720-1 2025-09-07T06:13:53.2987672Z * [new branch] update-audio-commit-hash/16924174496-1738-1 -> origin/update-audio-commit-hash/16924174496-1738-1 2025-09-07T06:13:53.2988917Z * [new branch] update-audio-commit-hash/17002010821-1749-1 -> origin/update-audio-commit-hash/17002010821-1749-1 2025-09-07T06:13:53.2989887Z * [new branch] update-audio-commit-hash/17056004427-1766-1 -> origin/update-audio-commit-hash/17056004427-1766-1 2025-09-07T06:13:53.2991255Z * [new branch] update-audio-commit-hash/17085054029-1767-1 -> origin/update-audio-commit-hash/17085054029-1767-1 2025-09-07T06:13:53.2992626Z * [new branch] update-audio-commit-hash/17142507405-1771-1 -> origin/update-audio-commit-hash/17142507405-1771-1 2025-09-07T06:13:53.2994015Z * [new branch] update-audio-commit-hash/17168762740-1773-1 -> origin/update-audio-commit-hash/17168762740-1773-1 2025-09-07T06:13:53.2995241Z * [new branch] update-audio-commit-hash/17311174639-1780-1 -> origin/update-audio-commit-hash/17311174639-1780-1 2025-09-07T06:13:53.2996423Z * [new branch] update-audio-commit-hash/17336898740-1781-1 -> origin/update-audio-commit-hash/17336898740-1781-1 2025-09-07T06:13:53.2997468Z * [new branch] update-audio-commit-hash/17389727684-1786-1 -> origin/update-audio-commit-hash/17389727684-1786-1 2025-09-07T06:13:53.2998679Z * [new branch] update-audio-commit-hash/17449538142-1790-1 -> origin/update-audio-commit-hash/17449538142-1790-1 2025-09-07T06:13:53.2999852Z * [new branch] update-audio-commit-hash/17507351808-1794-1 -> origin/update-audio-commit-hash/17507351808-1794-1 2025-09-07T06:13:53.3000911Z * [new branch] update-dynamic-shapes-doc -> origin/update-dynamic-shapes-doc 2025-09-07T06:13:53.3002754Z * [new branch] update-executorch-commit-hash/15694981040-1626-1 -> origin/update-executorch-commit-hash/15694981040-1626-1 2025-09-07T06:13:53.3004191Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-09-07T06:13:53.3005663Z * [new branch] update-vision-commit-hash/15336342773-1607-1 -> origin/update-vision-commit-hash/15336342773-1607-1 2025-09-07T06:13:53.3007194Z * [new branch] update-vllm-commit-hash/16737365217-1704-1 -> origin/update-vllm-commit-hash/16737365217-1704-1 2025-09-07T06:13:53.3008349Z * [new branch] update-vllm-commit-hash/16843157111-1713-1 -> origin/update-vllm-commit-hash/16843157111-1713-1 2025-09-07T06:13:53.3009363Z * [new branch] update-vllm-commit-hash/16855312394-1714-1 -> origin/update-vllm-commit-hash/16855312394-1714-1 2025-09-07T06:13:53.3010414Z * [new branch] update-vllm-commit-hash/16924174496-1738-1 -> origin/update-vllm-commit-hash/16924174496-1738-1 2025-09-07T06:13:53.3011673Z * [new branch] update-vllm-commit-hash/16952608705-1745-1 -> origin/update-vllm-commit-hash/16952608705-1745-1 2025-09-07T06:13:53.3013313Z * [new branch] update-vllm-commit-hash/16979836546-1748-1 -> origin/update-vllm-commit-hash/16979836546-1748-1 2025-09-07T06:13:53.3014710Z * [new branch] update-vllm-commit-hash/17014576881-1756-1 -> origin/update-vllm-commit-hash/17014576881-1756-1 2025-09-07T06:13:53.3016226Z * [new branch] update-vllm-commit-hash/17027830869-1761-1 -> origin/update-vllm-commit-hash/17027830869-1761-1 2025-09-07T06:13:53.3017493Z * [new branch] update-vllm-commit-hash/17056004427-1766-1 -> origin/update-vllm-commit-hash/17056004427-1766-1 2025-09-07T06:13:53.3018585Z * [new branch] update-vllm-commit-hash/17085054029-1767-1 -> origin/update-vllm-commit-hash/17085054029-1767-1 2025-09-07T06:13:53.3019819Z * [new branch] update-vllm-commit-hash/17113610216-1768-1 -> origin/update-vllm-commit-hash/17113610216-1768-1 2025-09-07T06:13:53.3020929Z * [new branch] update-vllm-commit-hash/17142507405-1771-1 -> origin/update-vllm-commit-hash/17142507405-1771-1 2025-09-07T06:13:53.3022182Z * [new branch] update-vllm-commit-hash/17181878974-1774-1 -> origin/update-vllm-commit-hash/17181878974-1774-1 2025-09-07T06:13:53.3023400Z * [new branch] update-vllm-commit-hash/17311174639-1780-1 -> origin/update-vllm-commit-hash/17311174639-1780-1 2025-09-07T06:13:53.3024548Z * [new branch] update-vllm-commit-hash/17336898740-1781-1 -> origin/update-vllm-commit-hash/17336898740-1781-1 2025-09-07T06:13:53.3025741Z * [new branch] update-vllm-commit-hash/17364352302-1785-1 -> origin/update-vllm-commit-hash/17364352302-1785-1 2025-09-07T06:13:53.3026828Z * [new branch] update-vllm-commit-hash/17389727684-1786-1 -> origin/update-vllm-commit-hash/17389727684-1786-1 2025-09-07T06:13:53.3028005Z * [new branch] update-vllm-commit-hash/17449538142-1790-1 -> origin/update-vllm-commit-hash/17449538142-1790-1 2025-09-07T06:13:53.3029159Z * [new branch] update-vllm-commit-hash/17480069797-1791-1 -> origin/update-vllm-commit-hash/17480069797-1791-1 2025-09-07T06:13:53.3030150Z * [new branch] update-vllm-commit-hash/17507351808-1794-1 -> origin/update-vllm-commit-hash/17507351808-1794-1 2025-09-07T06:13:53.3031931Z * [new branch] update-xla-commit-hash/16873912760-198-1 -> origin/update-xla-commit-hash/16873912760-198-1 2025-09-07T06:13:53.3032965Z * [new branch] update-xla-commit-hash/17034266655-199-1 -> origin/update-xla-commit-hash/17034266655-199-1 2025-09-07T06:13:53.3034097Z * [new branch] update-xla-commit-hash/17202464405-200-1 -> origin/update-xla-commit-hash/17202464405-200-1 2025-09-07T06:13:53.3035279Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-09-07T06:13:53.3036390Z * [new branch] update_executorch_pin -> origin/update_executorch_pin 2025-09-07T06:13:53.3037632Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-09-07T06:13:53.3038862Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-09-07T06:13:53.3040065Z * [new branch] update_slow_tests_1752478971 -> origin/update_slow_tests_1752478971 2025-09-07T06:13:53.3041271Z * [new branch] update_slow_tests_1755502951 -> origin/update_slow_tests_1755502951 2025-09-07T06:13:53.3042447Z * [new branch] update_slow_tests_1756107664 -> origin/update_slow_tests_1756107664 2025-09-07T06:13:53.3043664Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-09-07T06:13:53.3044885Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-09-07T06:13:53.3046137Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-09-07T06:13:53.3047360Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-09-07T06:13:53.3048951Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-09-07T06:13:53.3050735Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-09-07T06:13:53.3052175Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-09-07T06:13:53.3053696Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-09-07T06:13:53.3054927Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-09-07T06:13:53.3056267Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-09-07T06:13:53.3057604Z * [new branch] validate_fn -> origin/validate_fn 2025-09-07T06:13:53.3059018Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-09-07T06:13:53.3060365Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-09-07T06:13:53.3061970Z * [new branch] viable/strict -> origin/viable/strict 2025-09-07T06:13:53.3063207Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-09-07T06:13:53.3065003Z * [new branch] vllmpin -> origin/vllmpin 2025-09-07T06:13:53.3066637Z * [new branch] wdvr/conda_devcontainer -> origin/wdvr/conda_devcontainer 2025-09-07T06:13:53.3067669Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-09-07T06:13:53.3069106Z * [new branch] weight_sharing_cpp -> origin/weight_sharing_cpp 2025-09-07T06:13:53.3071052Z * [new branch] whc/flight4 -> origin/whc/flight4 2025-09-07T06:13:53.3072210Z * [new branch] whc/flight51 -> origin/whc/flight51 2025-09-07T06:13:53.3073376Z * [new branch] whc/flight53 -> origin/whc/flight53 2025-09-07T06:13:53.3074567Z * [new branch] whc/stage2 -> origin/whc/stage2 2025-09-07T06:13:53.3075629Z * [new branch] whc/uneven -> origin/whc/uneven 2025-09-07T06:13:53.3077241Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-09-07T06:13:53.3078491Z * [new branch] win_warnings -> origin/win_warnings 2025-09-07T06:13:53.3079707Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-09-07T06:13:53.3080779Z * [new branch] workonoldcommit -> origin/workonoldcommit 2025-09-07T06:13:53.3082334Z * [new branch] wychi-autotune-prune-configs-by-shared-mem -> origin/wychi-autotune-prune-configs-by-shared-mem 2025-09-07T06:13:53.3083631Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-09-07T06:13:53.3084721Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-09-07T06:13:53.3085990Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-09-07T06:13:53.3086770Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-09-07T06:13:53.3087992Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-09-07T06:13:53.3089035Z * [new branch] xmfan/ca_api -> origin/xmfan/ca_api 2025-09-07T06:13:53.3090115Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-09-07T06:13:53.3091531Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-09-07T06:13:53.3093465Z * [new branch] xmfan/ca_cudagraphs -> origin/xmfan/ca_cudagraphs 2025-09-07T06:13:53.3094588Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-09-07T06:13:53.3095755Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-09-07T06:13:53.3096889Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-09-07T06:13:53.3098098Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-09-07T06:13:53.3099088Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-09-07T06:13:53.3100270Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-09-07T06:13:53.3101341Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-09-07T06:13:53.3102988Z * [new branch] xmfan/ca_mem_base -> origin/xmfan/ca_mem_base 2025-09-07T06:13:53.3104237Z * [new branch] xmfan/ca_mem_fix -> origin/xmfan/ca_mem_fix 2025-09-07T06:13:53.3105365Z * [new branch] xmfan/ca_memory_fix -> origin/xmfan/ca_memory_fix 2025-09-07T06:13:53.3106533Z * [new branch] xmfan/ca_memory_fix_rebased -> origin/xmfan/ca_memory_fix_rebased 2025-09-07T06:13:53.3107819Z * [new branch] xmfan/ca_memory_fix_rebased2 -> origin/xmfan/ca_memory_fix_rebased2 2025-09-07T06:13:53.3108893Z * [new branch] xmfan/ca_move_to_cuda -> origin/xmfan/ca_move_to_cuda 2025-09-07T06:13:53.3109982Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-09-07T06:13:53.3111165Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-09-07T06:13:53.3112385Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-09-07T06:13:53.3113317Z * [new branch] xmfan/ca_scalar -> origin/xmfan/ca_scalar 2025-09-07T06:13:53.3114692Z * [new branch] xmfan/ca_subclass_mem_fix -> origin/xmfan/ca_subclass_mem_fix 2025-09-07T06:13:53.3115808Z * [new branch] xmfan/ca_warm_mem -> origin/xmfan/ca_warm_mem 2025-09-07T06:13:53.3116930Z * [new branch] xmfan/ca_warm_mem_base -> origin/xmfan/ca_warm_mem_base 2025-09-07T06:13:53.3118009Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-09-07T06:13:53.3119147Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-09-07T06:13:53.3120241Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-09-07T06:13:53.3121429Z * [new branch] xmfan/cacu_may27 -> origin/xmfan/cacu_may27 2025-09-07T06:13:53.3122584Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-09-07T06:13:53.3123735Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-09-07T06:13:53.3124775Z * [new branch] xmfan/issue_123374 -> origin/xmfan/issue_123374 2025-09-07T06:13:53.3126357Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-09-07T06:13:53.3127480Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-09-07T06:13:53.3128384Z * [new branch] xmfan/segfault_test -> origin/xmfan/segfault_test 2025-09-07T06:13:53.3129524Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-09-07T06:13:53.3130675Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-09-07T06:13:53.3132193Z * [new branch] xmfan/test -> origin/xmfan/test 2025-09-07T06:13:53.3133927Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-09-07T06:13:53.3135059Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-09-07T06:13:53.3136196Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-09-07T06:13:53.3137402Z * [new branch] yihan_quantization -> origin/yihan_quantization 2025-09-07T06:13:53.3139110Z * [new branch] yiming/add_jit_trace_benchmark -> origin/yiming/add_jit_trace_benchmark 2025-09-07T06:13:53.3140220Z * [new branch] yiming/add_nativert_benchmark -> origin/yiming/add_nativert_benchmark 2025-09-07T06:13:53.3141311Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-09-07T06:13:53.3142932Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-09-07T06:13:53.3144377Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-09-07T06:13:53.3145386Z * [new branch] zainr/git-push-v2 -> origin/zainr/git-push-v2 2025-09-07T06:13:53.3146505Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-09-07T06:13:53.3147551Z * [new branch] zainr/test -> origin/zainr/test 2025-09-07T06:13:53.3148579Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-09-07T06:13:53.3150248Z * [new branch] zainr/unstable -> origin/zainr/unstable 2025-09-07T06:13:53.3151322Z * [new branch] zainr/unstable-xla -> origin/zainr/unstable-xla 2025-09-07T06:13:53.3153156Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-09-07T06:13:53.3154377Z * [new branch] zb2p -> origin/zb2p 2025-09-07T06:13:53.3155757Z * [new branch] zero_grad_optimization -> origin/zero_grad_optimization 2025-09-07T06:13:53.3156972Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-09-07T06:13:53.3158838Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-09-07T06:13:53.3160608Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-09-07T06:13:53.3162339Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-09-07T06:13:53.3163660Z * [new tag] bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug 2025-09-07T06:13:53.3164335Z * [new tag] ci/binaries/77164 -> ci/binaries/77164 2025-09-07T06:13:53.3165470Z * [new tag] ciflow/binaries/156049 -> ciflow/binaries/156049 2025-09-07T06:13:53.3166146Z * [new tag] ciflow/binaries/156712 -> ciflow/binaries/156712 2025-09-07T06:13:53.3166882Z * [new tag] ciflow/binaries/157432 -> ciflow/binaries/157432 2025-09-07T06:13:53.3167675Z * [new tag] ciflow/binaries/157685 -> ciflow/binaries/157685 2025-09-07T06:13:53.3168386Z * [new tag] ciflow/binaries/157689 -> ciflow/binaries/157689 2025-09-07T06:13:53.3169094Z * [new tag] ciflow/binaries/158104 -> ciflow/binaries/158104 2025-09-07T06:13:53.3169957Z * [new tag] ciflow/binaries/160229 -> ciflow/binaries/160229 2025-09-07T06:13:53.3170768Z * [new tag] ciflow/binaries/160720 -> ciflow/binaries/160720 2025-09-07T06:13:53.3171573Z * [new tag] ciflow/binaries/162080 -> ciflow/binaries/162080 2025-09-07T06:13:53.3173079Z * [new tag] ciflow/binaries/162329 -> ciflow/binaries/162329 2025-09-07T06:13:53.3173971Z * [new tag] ciflow/binaries_libtorch/156049 -> ciflow/binaries_libtorch/156049 2025-09-07T06:13:53.3174729Z * [new tag] ciflow/binaries_libtorch/156711 -> ciflow/binaries_libtorch/156711 2025-09-07T06:13:53.3175495Z * [new tag] ciflow/binaries_libtorch/157432 -> ciflow/binaries_libtorch/157432 2025-09-07T06:13:53.3176315Z * [new tag] ciflow/binaries_wheel/156049 -> ciflow/binaries_wheel/156049 2025-09-07T06:13:53.3177092Z * [new tag] ciflow/binaries_wheel/156711 -> ciflow/binaries_wheel/156711 2025-09-07T06:13:53.3177811Z * [new tag] ciflow/binaries_wheel/157432 -> ciflow/binaries_wheel/157432 2025-09-07T06:13:53.3178559Z * [new tag] ciflow/binaries_wheel/162136 -> ciflow/binaries_wheel/162136 2025-09-07T06:13:53.3179493Z * [new tag] ciflow/binaries_wheel/162252 -> ciflow/binaries_wheel/162252 2025-09-07T06:13:53.3180231Z * [new tag] ciflow/binaries_wheel/162325 -> ciflow/binaries_wheel/162325 2025-09-07T06:13:53.3181336Z * [new tag] ciflow/h100-distributed/156703 -> ciflow/h100-distributed/156703 2025-09-07T06:13:53.3182139Z * [new tag] ciflow/h100-symm-mem/157635 -> ciflow/h100-symm-mem/157635 2025-09-07T06:13:53.3182899Z * [new tag] ciflow/h100-symm-mem/161984 -> ciflow/h100-symm-mem/161984 2025-09-07T06:13:53.3183868Z * [new tag] ciflow/h100-symm-mem/162003 -> ciflow/h100-symm-mem/162003 2025-09-07T06:13:53.3184624Z * [new tag] ciflow/h100-symm-mem/162011 -> ciflow/h100-symm-mem/162011 2025-09-07T06:13:53.3185356Z * [new tag] ciflow/h100-symm-mem/162026 -> ciflow/h100-symm-mem/162026 2025-09-07T06:13:53.3186063Z * [new tag] ciflow/h100-symm-mem/162033 -> ciflow/h100-symm-mem/162033 2025-09-07T06:13:53.3186774Z * [new tag] ciflow/h100-symm-mem/162040 -> ciflow/h100-symm-mem/162040 2025-09-07T06:13:53.3187495Z * [new tag] ciflow/h100-symm-mem/162041 -> ciflow/h100-symm-mem/162041 2025-09-07T06:13:53.3188247Z * [new tag] ciflow/h100-symm-mem/162142 -> ciflow/h100-symm-mem/162142 2025-09-07T06:13:53.3188956Z * [new tag] ciflow/h100-symm-mem/162150 -> ciflow/h100-symm-mem/162150 2025-09-07T06:13:53.3189856Z * [new tag] ciflow/h100-symm-mem/162243 -> ciflow/h100-symm-mem/162243 2025-09-07T06:13:53.3190702Z * [new tag] ciflow/h100-symm-mem/162320 -> ciflow/h100-symm-mem/162320 2025-09-07T06:13:53.3191636Z * [new tag] ciflow/h100/159158 -> ciflow/h100/159158 2025-09-07T06:13:53.3192914Z * [new tag] ciflow/h100/160480 -> ciflow/h100/160480 2025-09-07T06:13:53.3193682Z * [new tag] ciflow/h100/161749 -> ciflow/h100/161749 2025-09-07T06:13:53.3194523Z * [new tag] ciflow/h100/162022 -> ciflow/h100/162022 2025-09-07T06:13:53.3195242Z * [new tag] ciflow/h100/162278 -> ciflow/h100/162278 2025-09-07T06:13:53.3196614Z * [new tag] ciflow/inductor-perf-test-nightly-rocm/156592 -> ciflow/inductor-perf-test-nightly-rocm/156592 2025-09-07T06:13:53.3197537Z * [new tag] ciflow/inductor-perf-test-nightly/156592 -> ciflow/inductor-perf-test-nightly/156592 2025-09-07T06:13:53.3198893Z * [new tag] ciflow/inductor-periodic/162063 -> ciflow/inductor-periodic/162063 2025-09-07T06:13:53.3199618Z * [new tag] ciflow/inductor-periodic/162227 -> ciflow/inductor-periodic/162227 2025-09-07T06:13:53.3200440Z * [new tag] ciflow/inductor-periodic/162323 -> ciflow/inductor-periodic/162323 2025-09-07T06:13:53.3201445Z * [new tag] ciflow/inductor-rocm/154170 -> ciflow/inductor-rocm/154170 2025-09-07T06:13:53.3202546Z * [new tag] ciflow/inductor-rocm/159146 -> ciflow/inductor-rocm/159146 2025-09-07T06:13:53.3203158Z * [new tag] ciflow/inductor-rocm/159158 -> ciflow/inductor-rocm/159158 2025-09-07T06:13:53.3204036Z * [new tag] ciflow/inductor-rocm/161715 -> ciflow/inductor-rocm/161715 2025-09-07T06:13:53.3204945Z * [new tag] ciflow/inductor-rocm/162053 -> ciflow/inductor-rocm/162053 2025-09-07T06:13:53.3205957Z * [new tag] ciflow/inductor-rocm/162056 -> ciflow/inductor-rocm/162056 2025-09-07T06:13:53.3206755Z * [new tag] ciflow/inductor/137400 -> ciflow/inductor/137400 2025-09-07T06:13:53.3207470Z * [new tag] ciflow/inductor/148180 -> ciflow/inductor/148180 2025-09-07T06:13:53.3208161Z * [new tag] ciflow/inductor/148328 -> ciflow/inductor/148328 2025-09-07T06:13:53.3208887Z * [new tag] ciflow/inductor/148484 -> ciflow/inductor/148484 2025-09-07T06:13:53.3209702Z * [new tag] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-09-07T06:13:53.3210300Z * [new tag] ciflow/inductor/152624 -> ciflow/inductor/152624 2025-09-07T06:13:53.3211026Z * [new tag] ciflow/inductor/154694 -> ciflow/inductor/154694 2025-09-07T06:13:53.3212041Z * [new tag] ciflow/inductor/156049 -> ciflow/inductor/156049 2025-09-07T06:13:53.3212824Z * [new tag] ciflow/inductor/156592 -> ciflow/inductor/156592 2025-09-07T06:13:53.3213624Z * [new tag] ciflow/inductor/157635 -> ciflow/inductor/157635 2025-09-07T06:13:53.3214336Z * [new tag] ciflow/inductor/157685 -> ciflow/inductor/157685 2025-09-07T06:13:53.3215484Z * [new tag] ciflow/inductor/157686 -> ciflow/inductor/157686 2025-09-07T06:13:53.3216546Z * [new tag] ciflow/inductor/157689 -> ciflow/inductor/157689 2025-09-07T06:13:53.3217417Z * [new tag] ciflow/inductor/157699 -> ciflow/inductor/157699 2025-09-07T06:13:53.3218390Z * [new tag] ciflow/inductor/157743 -> ciflow/inductor/157743 2025-09-07T06:13:53.3219293Z * [new tag] ciflow/inductor/157994 -> ciflow/inductor/157994 2025-09-07T06:13:53.3220077Z * [new tag] ciflow/inductor/158091 -> ciflow/inductor/158091 2025-09-07T06:13:53.3220863Z * [new tag] ciflow/inductor/158104 -> ciflow/inductor/158104 2025-09-07T06:13:53.3221846Z * [new tag] ciflow/inductor/158404 -> ciflow/inductor/158404 2025-09-07T06:13:53.3222549Z * [new tag] ciflow/inductor/158647 -> ciflow/inductor/158647 2025-09-07T06:13:53.3223568Z * [new tag] ciflow/inductor/158932 -> ciflow/inductor/158932 2025-09-07T06:13:53.3224395Z * [new tag] ciflow/inductor/159146 -> ciflow/inductor/159146 2025-09-07T06:13:53.3225128Z * [new tag] ciflow/inductor/159158 -> ciflow/inductor/159158 2025-09-07T06:13:53.3226174Z * [new tag] ciflow/inductor/159274 -> ciflow/inductor/159274 2025-09-07T06:13:53.3226866Z * [new tag] ciflow/inductor/159664 -> ciflow/inductor/159664 2025-09-07T06:13:53.3227840Z * [new tag] ciflow/inductor/159778 -> ciflow/inductor/159778 2025-09-07T06:13:53.3228631Z * [new tag] ciflow/inductor/159835 -> ciflow/inductor/159835 2025-09-07T06:13:53.3229661Z * [new tag] ciflow/inductor/159944 -> ciflow/inductor/159944 2025-09-07T06:13:53.3230511Z * [new tag] ciflow/inductor/160161 -> ciflow/inductor/160161 2025-09-07T06:13:53.3231276Z * [new tag] ciflow/inductor/160174 -> ciflow/inductor/160174 2025-09-07T06:13:53.3232287Z * [new tag] ciflow/inductor/160323 -> ciflow/inductor/160323 2025-09-07T06:13:53.3233351Z * [new tag] ciflow/inductor/160324 -> ciflow/inductor/160324 2025-09-07T06:13:53.3241638Z * [new tag] ciflow/inductor/160325 -> ciflow/inductor/160325 2025-09-07T06:13:53.3241958Z * [new tag] ciflow/inductor/160326 -> ciflow/inductor/160326 2025-09-07T06:13:53.3242188Z * [new tag] ciflow/inductor/160327 -> ciflow/inductor/160327 2025-09-07T06:13:53.3242404Z * [new tag] ciflow/inductor/160328 -> ciflow/inductor/160328 2025-09-07T06:13:53.3242607Z * [new tag] ciflow/inductor/160329 -> ciflow/inductor/160329 2025-09-07T06:13:53.3242807Z * [new tag] ciflow/inductor/160480 -> ciflow/inductor/160480 2025-09-07T06:13:53.3243008Z * [new tag] ciflow/inductor/160532 -> ciflow/inductor/160532 2025-09-07T06:13:53.3243220Z * [new tag] ciflow/inductor/160539 -> ciflow/inductor/160539 2025-09-07T06:13:53.3243565Z * [new tag] ciflow/inductor/160580 -> ciflow/inductor/160580 2025-09-07T06:13:53.3243776Z * [new tag] ciflow/inductor/160685 -> ciflow/inductor/160685 2025-09-07T06:13:53.3243988Z * [new tag] ciflow/inductor/160686 -> ciflow/inductor/160686 2025-09-07T06:13:53.3244485Z * [new tag] ciflow/inductor/160687 -> ciflow/inductor/160687 2025-09-07T06:13:53.3244956Z * [new tag] ciflow/inductor/160688 -> ciflow/inductor/160688 2025-09-07T06:13:53.3245702Z * [new tag] ciflow/inductor/160690 -> ciflow/inductor/160690 2025-09-07T06:13:53.3246433Z * [new tag] ciflow/inductor/160706 -> ciflow/inductor/160706 2025-09-07T06:13:53.3247211Z * [new tag] ciflow/inductor/160729 -> ciflow/inductor/160729 2025-09-07T06:13:53.3248677Z * [new tag] ciflow/inductor/160798 -> ciflow/inductor/160798 2025-09-07T06:13:53.3250050Z * [new tag] ciflow/inductor/160836 -> ciflow/inductor/160836 2025-09-07T06:13:53.3250729Z * [new tag] ciflow/inductor/160843 -> ciflow/inductor/160843 2025-09-07T06:13:53.3252125Z * [new tag] ciflow/inductor/160869 -> ciflow/inductor/160869 2025-09-07T06:13:53.3253018Z * [new tag] ciflow/inductor/160920 -> ciflow/inductor/160920 2025-09-07T06:13:53.3253637Z * [new tag] ciflow/inductor/160943 -> ciflow/inductor/160943 2025-09-07T06:13:53.3254407Z * [new tag] ciflow/inductor/161092 -> ciflow/inductor/161092 2025-09-07T06:13:53.3255187Z * [new tag] ciflow/inductor/161093 -> ciflow/inductor/161093 2025-09-07T06:13:53.3256173Z * [new tag] ciflow/inductor/161109 -> ciflow/inductor/161109 2025-09-07T06:13:53.3256914Z * [new tag] ciflow/inductor/161118 -> ciflow/inductor/161118 2025-09-07T06:13:53.3258060Z * [new tag] ciflow/inductor/161178 -> ciflow/inductor/161178 2025-09-07T06:13:53.3258891Z * [new tag] ciflow/inductor/161246 -> ciflow/inductor/161246 2025-09-07T06:13:53.3259648Z * [new tag] ciflow/inductor/161349 -> ciflow/inductor/161349 2025-09-07T06:13:53.3260443Z * [new tag] ciflow/inductor/161350 -> ciflow/inductor/161350 2025-09-07T06:13:53.3261241Z * [new tag] ciflow/inductor/161351 -> ciflow/inductor/161351 2025-09-07T06:13:53.3262225Z * [new tag] ciflow/inductor/161397 -> ciflow/inductor/161397 2025-09-07T06:13:53.3262928Z * [new tag] ciflow/inductor/161404 -> ciflow/inductor/161404 2025-09-07T06:13:53.3263857Z * [new tag] ciflow/inductor/161405 -> ciflow/inductor/161405 2025-09-07T06:13:53.3264671Z * [new tag] ciflow/inductor/161406 -> ciflow/inductor/161406 2025-09-07T06:13:53.3265744Z * [new tag] ciflow/inductor/161410 -> ciflow/inductor/161410 2025-09-07T06:13:53.3266440Z * [new tag] ciflow/inductor/161414 -> ciflow/inductor/161414 2025-09-07T06:13:53.3267511Z * [new tag] ciflow/inductor/161442 -> ciflow/inductor/161442 2025-09-07T06:13:53.3268236Z * [new tag] ciflow/inductor/161458 -> ciflow/inductor/161458 2025-09-07T06:13:53.3268988Z * [new tag] ciflow/inductor/161468 -> ciflow/inductor/161468 2025-09-07T06:13:53.3269738Z * [new tag] ciflow/inductor/161469 -> ciflow/inductor/161469 2025-09-07T06:13:53.3270694Z * [new tag] ciflow/inductor/161485 -> ciflow/inductor/161485 2025-09-07T06:13:53.3271513Z * [new tag] ciflow/inductor/161499 -> ciflow/inductor/161499 2025-09-07T06:13:53.3272299Z * [new tag] ciflow/inductor/161534 -> ciflow/inductor/161534 2025-09-07T06:13:53.3273085Z * [new tag] ciflow/inductor/161595 -> ciflow/inductor/161595 2025-09-07T06:13:53.3273951Z * [new tag] ciflow/inductor/161596 -> ciflow/inductor/161596 2025-09-07T06:13:53.3275236Z * [new tag] ciflow/inductor/161630 -> ciflow/inductor/161630 2025-09-07T06:13:53.3275952Z * [new tag] ciflow/inductor/161667 -> ciflow/inductor/161667 2025-09-07T06:13:53.3276719Z * [new tag] ciflow/inductor/161670 -> ciflow/inductor/161670 2025-09-07T06:13:53.3277487Z * [new tag] ciflow/inductor/161673 -> ciflow/inductor/161673 2025-09-07T06:13:53.3278249Z * [new tag] ciflow/inductor/161674 -> ciflow/inductor/161674 2025-09-07T06:13:53.3279015Z * [new tag] ciflow/inductor/161675 -> ciflow/inductor/161675 2025-09-07T06:13:53.3279785Z * [new tag] ciflow/inductor/161693 -> ciflow/inductor/161693 2025-09-07T06:13:53.3280532Z * [new tag] ciflow/inductor/161695 -> ciflow/inductor/161695 2025-09-07T06:13:53.3281297Z * [new tag] ciflow/inductor/161715 -> ciflow/inductor/161715 2025-09-07T06:13:53.3282072Z * [new tag] ciflow/inductor/161730 -> ciflow/inductor/161730 2025-09-07T06:13:53.3282841Z * [new tag] ciflow/inductor/161732 -> ciflow/inductor/161732 2025-09-07T06:13:53.3283788Z * [new tag] ciflow/inductor/161744 -> ciflow/inductor/161744 2025-09-07T06:13:53.3284603Z * [new tag] ciflow/inductor/161746 -> ciflow/inductor/161746 2025-09-07T06:13:53.3285386Z * [new tag] ciflow/inductor/161747 -> ciflow/inductor/161747 2025-09-07T06:13:53.3286139Z * [new tag] ciflow/inductor/161819 -> ciflow/inductor/161819 2025-09-07T06:13:53.3286907Z * [new tag] ciflow/inductor/161821 -> ciflow/inductor/161821 2025-09-07T06:13:53.3287695Z * [new tag] ciflow/inductor/161828 -> ciflow/inductor/161828 2025-09-07T06:13:53.3288427Z * [new tag] ciflow/inductor/161879 -> ciflow/inductor/161879 2025-09-07T06:13:53.3289193Z * [new tag] ciflow/inductor/161880 -> ciflow/inductor/161880 2025-09-07T06:13:53.3289953Z * [new tag] ciflow/inductor/161881 -> ciflow/inductor/161881 2025-09-07T06:13:53.3290959Z * [new tag] ciflow/inductor/161907 -> ciflow/inductor/161907 2025-09-07T06:13:53.3291888Z * [new tag] ciflow/inductor/161914 -> ciflow/inductor/161914 2025-09-07T06:13:53.3292926Z * [new tag] ciflow/inductor/161924 -> ciflow/inductor/161924 2025-09-07T06:13:53.3293786Z * [new tag] ciflow/inductor/161936 -> ciflow/inductor/161936 2025-09-07T06:13:53.3294618Z * [new tag] ciflow/inductor/161938 -> ciflow/inductor/161938 2025-09-07T06:13:53.3295441Z * [new tag] ciflow/inductor/161939 -> ciflow/inductor/161939 2025-09-07T06:13:53.3296243Z * [new tag] ciflow/inductor/161940 -> ciflow/inductor/161940 2025-09-07T06:13:53.3297024Z * [new tag] ciflow/inductor/161955 -> ciflow/inductor/161955 2025-09-07T06:13:53.3297838Z * [new tag] ciflow/inductor/161957 -> ciflow/inductor/161957 2025-09-07T06:13:53.3298613Z * [new tag] ciflow/inductor/161975 -> ciflow/inductor/161975 2025-09-07T06:13:53.3299384Z * [new tag] ciflow/inductor/161977 -> ciflow/inductor/161977 2025-09-07T06:13:53.3300172Z * [new tag] ciflow/inductor/161978 -> ciflow/inductor/161978 2025-09-07T06:13:53.3301001Z * [new tag] ciflow/inductor/161979 -> ciflow/inductor/161979 2025-09-07T06:13:53.3301739Z * [new tag] ciflow/inductor/161980 -> ciflow/inductor/161980 2025-09-07T06:13:53.3303020Z * [new tag] ciflow/inductor/161988 -> ciflow/inductor/161988 2025-09-07T06:13:53.3303945Z * [new tag] ciflow/inductor/161994 -> ciflow/inductor/161994 2025-09-07T06:13:53.3304644Z * [new tag] ciflow/inductor/162013 -> ciflow/inductor/162013 2025-09-07T06:13:53.3305395Z * [new tag] ciflow/inductor/162014 -> ciflow/inductor/162014 2025-09-07T06:13:53.3306153Z * [new tag] ciflow/inductor/162017 -> ciflow/inductor/162017 2025-09-07T06:13:53.3306959Z * [new tag] ciflow/inductor/162021 -> ciflow/inductor/162021 2025-09-07T06:13:53.3307751Z * [new tag] ciflow/inductor/162023 -> ciflow/inductor/162023 2025-09-07T06:13:53.3308480Z * [new tag] ciflow/inductor/162027 -> ciflow/inductor/162027 2025-09-07T06:13:53.3309229Z * [new tag] ciflow/inductor/162029 -> ciflow/inductor/162029 2025-09-07T06:13:53.3309998Z * [new tag] ciflow/inductor/162030 -> ciflow/inductor/162030 2025-09-07T06:13:53.3310843Z * [new tag] ciflow/inductor/162031 -> ciflow/inductor/162031 2025-09-07T06:13:53.3311688Z * [new tag] ciflow/inductor/162033 -> ciflow/inductor/162033 2025-09-07T06:13:53.3312733Z * [new tag] ciflow/inductor/162052 -> ciflow/inductor/162052 2025-09-07T06:13:53.3313442Z * [new tag] ciflow/inductor/162053 -> ciflow/inductor/162053 2025-09-07T06:13:53.3314190Z * [new tag] ciflow/inductor/162056 -> ciflow/inductor/162056 2025-09-07T06:13:53.3314965Z * [new tag] ciflow/inductor/162063 -> ciflow/inductor/162063 2025-09-07T06:13:53.3315727Z * [new tag] ciflow/inductor/162066 -> ciflow/inductor/162066 2025-09-07T06:13:53.3316478Z * [new tag] ciflow/inductor/162068 -> ciflow/inductor/162068 2025-09-07T06:13:53.3317496Z * [new tag] ciflow/inductor/162081 -> ciflow/inductor/162081 2025-09-07T06:13:53.3318218Z * [new tag] ciflow/inductor/162088 -> ciflow/inductor/162088 2025-09-07T06:13:53.3319004Z * [new tag] ciflow/inductor/162089 -> ciflow/inductor/162089 2025-09-07T06:13:53.3319740Z * [new tag] ciflow/inductor/162094 -> ciflow/inductor/162094 2025-09-07T06:13:53.3320513Z * [new tag] ciflow/inductor/162098 -> ciflow/inductor/162098 2025-09-07T06:13:53.3321305Z * [new tag] ciflow/inductor/162101 -> ciflow/inductor/162101 2025-09-07T06:13:53.3322054Z * [new tag] ciflow/inductor/162102 -> ciflow/inductor/162102 2025-09-07T06:13:53.3322812Z * [new tag] ciflow/inductor/162104 -> ciflow/inductor/162104 2025-09-07T06:13:53.3323604Z * [new tag] ciflow/inductor/162106 -> ciflow/inductor/162106 2025-09-07T06:13:53.3324383Z * [new tag] ciflow/inductor/162108 -> ciflow/inductor/162108 2025-09-07T06:13:53.3325156Z * [new tag] ciflow/inductor/162126 -> ciflow/inductor/162126 2025-09-07T06:13:53.3325975Z * [new tag] ciflow/inductor/162149 -> ciflow/inductor/162149 2025-09-07T06:13:53.3326716Z * [new tag] ciflow/inductor/162164 -> ciflow/inductor/162164 2025-09-07T06:13:53.3327482Z * [new tag] ciflow/inductor/162166 -> ciflow/inductor/162166 2025-09-07T06:13:53.3328273Z * [new tag] ciflow/inductor/162169 -> ciflow/inductor/162169 2025-09-07T06:13:53.3329034Z * [new tag] ciflow/inductor/162170 -> ciflow/inductor/162170 2025-09-07T06:13:53.3329799Z * [new tag] ciflow/inductor/162171 -> ciflow/inductor/162171 2025-09-07T06:13:53.3330570Z * [new tag] ciflow/inductor/162183 -> ciflow/inductor/162183 2025-09-07T06:13:53.3331402Z * [new tag] ciflow/inductor/162189 -> ciflow/inductor/162189 2025-09-07T06:13:53.3332461Z * [new tag] ciflow/inductor/162190 -> ciflow/inductor/162190 2025-09-07T06:13:53.3333331Z * [new tag] ciflow/inductor/162191 -> ciflow/inductor/162191 2025-09-07T06:13:53.3334018Z * [new tag] ciflow/inductor/162194 -> ciflow/inductor/162194 2025-09-07T06:13:53.3335108Z * [new tag] ciflow/inductor/162200 -> ciflow/inductor/162200 2025-09-07T06:13:53.3335816Z * [new tag] ciflow/inductor/162201 -> ciflow/inductor/162201 2025-09-07T06:13:53.3336705Z * [new tag] ciflow/inductor/162208 -> ciflow/inductor/162208 2025-09-07T06:13:53.3337718Z * [new tag] ciflow/inductor/162211 -> ciflow/inductor/162211 2025-09-07T06:13:53.3338467Z * [new tag] ciflow/inductor/162216 -> ciflow/inductor/162216 2025-09-07T06:13:53.3339241Z * [new tag] ciflow/inductor/162220 -> ciflow/inductor/162220 2025-09-07T06:13:53.3340247Z * [new tag] ciflow/inductor/162222 -> ciflow/inductor/162222 2025-09-07T06:13:53.3341001Z * [new tag] ciflow/inductor/162227 -> ciflow/inductor/162227 2025-09-07T06:13:53.3341776Z * [new tag] ciflow/inductor/162238 -> ciflow/inductor/162238 2025-09-07T06:13:53.3342601Z * [new tag] ciflow/inductor/162239 -> ciflow/inductor/162239 2025-09-07T06:13:53.3343361Z * [new tag] ciflow/inductor/162240 -> ciflow/inductor/162240 2025-09-07T06:13:53.3344260Z * [new tag] ciflow/inductor/162244 -> ciflow/inductor/162244 2025-09-07T06:13:53.3345009Z * [new tag] ciflow/inductor/162245 -> ciflow/inductor/162245 2025-09-07T06:13:53.3345800Z * [new tag] ciflow/inductor/162262 -> ciflow/inductor/162262 2025-09-07T06:13:53.3346575Z * [new tag] ciflow/inductor/162275 -> ciflow/inductor/162275 2025-09-07T06:13:53.3347318Z * [new tag] ciflow/inductor/162278 -> ciflow/inductor/162278 2025-09-07T06:13:53.3348111Z * [new tag] ciflow/inductor/162284 -> ciflow/inductor/162284 2025-09-07T06:13:53.3349099Z * [new tag] ciflow/inductor/162286 -> ciflow/inductor/162286 2025-09-07T06:13:53.3351852Z * [new tag] ciflow/inductor/162288 -> ciflow/inductor/162288 2025-09-07T06:13:53.3352592Z * [new tag] ciflow/inductor/162293 -> ciflow/inductor/162293 2025-09-07T06:13:53.3353389Z * [new tag] ciflow/inductor/162294 -> ciflow/inductor/162294 2025-09-07T06:13:53.3354198Z * [new tag] ciflow/inductor/162295 -> ciflow/inductor/162295 2025-09-07T06:13:53.3354981Z * [new tag] ciflow/inductor/162296 -> ciflow/inductor/162296 2025-09-07T06:13:53.3356009Z * [new tag] ciflow/inductor/162298 -> ciflow/inductor/162298 2025-09-07T06:13:53.3357211Z * [new tag] ciflow/inductor/162307 -> ciflow/inductor/162307 2025-09-07T06:13:53.3358025Z * [new tag] ciflow/inductor/162309 -> ciflow/inductor/162309 2025-09-07T06:13:53.3358756Z * [new tag] ciflow/inductor/162311 -> ciflow/inductor/162311 2025-09-07T06:13:53.3359583Z * [new tag] ciflow/inductor/162312 -> ciflow/inductor/162312 2025-09-07T06:13:53.3360382Z * [new tag] ciflow/inductor/162315 -> ciflow/inductor/162315 2025-09-07T06:13:53.3361179Z * [new tag] ciflow/inductor/162316 -> ciflow/inductor/162316 2025-09-07T06:13:53.3362044Z * [new tag] ciflow/inductor/162318 -> ciflow/inductor/162318 2025-09-07T06:13:53.3362871Z * [new tag] ciflow/inductor/162323 -> ciflow/inductor/162323 2025-09-07T06:13:53.3363634Z * [new tag] ciflow/inductor/162341 -> ciflow/inductor/162341 2025-09-07T06:13:53.3364412Z * [new tag] ciflow/inductor/162345 -> ciflow/inductor/162345 2025-09-07T06:13:53.3365596Z * [new tag] ciflow/inductor/3b9a386 -> ciflow/inductor/3b9a386 2025-09-07T06:13:53.3366689Z * [new tag] ciflow/inductor/3d4b92b -> ciflow/inductor/3d4b92b 2025-09-07T06:13:53.3367440Z * [new tag] ciflow/inductor/d224ac7 -> ciflow/inductor/d224ac7 2025-09-07T06:13:53.3368446Z * [new tag] ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994 2025-09-07T06:13:53.3369069Z * [new tag] ciflow/linux-aarch64/159737 -> ciflow/linux-aarch64/159737 2025-09-07T06:13:53.3369794Z * [new tag] ciflow/linux-aarch64/160078 -> ciflow/linux-aarch64/160078 2025-09-07T06:13:53.3370647Z * [new tag] ciflow/mps/157553 -> ciflow/mps/157553 2025-09-07T06:13:53.3371429Z * [new tag] ciflow/mps/157635 -> ciflow/mps/157635 2025-09-07T06:13:53.3372415Z * [new tag] ciflow/mps/161988 -> ciflow/mps/161988 2025-09-07T06:13:53.3373154Z * [new tag] ciflow/mps/162108 -> ciflow/mps/162108 2025-09-07T06:13:53.3373907Z * [new tag] ciflow/mps/162153 -> ciflow/mps/162153 2025-09-07T06:13:53.3374650Z * [new tag] ciflow/mps/162281 -> ciflow/mps/162281 2025-09-07T06:13:53.3375588Z * [new tag] ciflow/nightly/156049 -> ciflow/nightly/156049 2025-09-07T06:13:53.3376327Z * [new tag] ciflow/nightly/158104 -> ciflow/nightly/158104 2025-09-07T06:13:53.3377373Z * [new tag] ciflow/op-benchmark/157994 -> ciflow/op-benchmark/157994 2025-09-07T06:13:53.3378458Z * [new tag] ciflow/periodic-rocm-mi300/161529 -> ciflow/periodic-rocm-mi300/161529 2025-09-07T06:13:53.3379237Z * [new tag] ciflow/periodic-rocm-mi300/161715 -> ciflow/periodic-rocm-mi300/161715 2025-09-07T06:13:53.3380326Z * [new tag] ciflow/periodic/054a2fd -> ciflow/periodic/054a2fd 2025-09-07T06:13:53.3381040Z * [new tag] ciflow/periodic/156703 -> ciflow/periodic/156703 2025-09-07T06:13:53.3381753Z * [new tag] ciflow/periodic/161715 -> ciflow/periodic/161715 2025-09-07T06:13:53.3382473Z * [new tag] ciflow/periodic/162021 -> ciflow/periodic/162021 2025-09-07T06:13:53.3383184Z * [new tag] ciflow/periodic/162323 -> ciflow/periodic/162323 2025-09-07T06:13:53.3384314Z * [new tag] ciflow/periodic/2a6d37d -> ciflow/periodic/2a6d37d 2025-09-07T06:13:53.3385137Z * [new tag] ciflow/periodic/317eeb8 -> ciflow/periodic/317eeb8 2025-09-07T06:13:53.3385966Z * [new tag] ciflow/periodic/3c32 -> ciflow/periodic/3c32 2025-09-07T06:13:53.3386977Z * [new tag] ciflow/periodic/3e98831 -> ciflow/periodic/3e98831 2025-09-07T06:13:53.3388100Z * [new tag] ciflow/periodic/94512-point -> ciflow/periodic/94512-point 2025-09-07T06:13:53.3389216Z * [new tag] ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519 2025-09-07T06:13:53.3390441Z * [new tag] ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275 2025-09-07T06:13:53.3391531Z * [new tag] ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761 2025-09-07T06:13:53.3392602Z * [new tag] ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12 2025-09-07T06:13:53.3393661Z * [new tag] ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0 2025-09-07T06:13:53.3394734Z * [new tag] ciflow/periodic/sha-ec5b83 -> ciflow/periodic/sha-ec5b83 2025-09-07T06:13:53.3395537Z * [new tag] ciflow/rocm-mi300/154170 -> ciflow/rocm-mi300/154170 2025-09-07T06:13:53.3396495Z * [new tag] ciflow/rocm-mi300/158747 -> ciflow/rocm-mi300/158747 2025-09-07T06:13:53.3397089Z * [new tag] ciflow/rocm-mi300/159146 -> ciflow/rocm-mi300/159146 2025-09-07T06:13:53.3397879Z * [new tag] ciflow/rocm-mi300/159158 -> ciflow/rocm-mi300/159158 2025-09-07T06:13:53.3398539Z * [new tag] ciflow/rocm-mi300/161715 -> ciflow/rocm-mi300/161715 2025-09-07T06:13:53.3399251Z * [new tag] ciflow/rocm-mi300/161957 -> ciflow/rocm-mi300/161957 2025-09-07T06:13:53.3399965Z * [new tag] ciflow/rocm-mi300/162053 -> ciflow/rocm-mi300/162053 2025-09-07T06:13:53.3400686Z * [new tag] ciflow/rocm-mi300/162056 -> ciflow/rocm-mi300/162056 2025-09-07T06:13:53.3401521Z * [new tag] ciflow/rocm-mi300/162112 -> ciflow/rocm-mi300/162112 2025-09-07T06:13:53.3402257Z * [new tag] ciflow/rocm-mi300/162245 -> ciflow/rocm-mi300/162245 2025-09-07T06:13:53.3402981Z * [new tag] ciflow/rocm-mi300/162278 -> ciflow/rocm-mi300/162278 2025-09-07T06:13:53.3404010Z * [new tag] ciflow/rocm-mi300/162288 -> ciflow/rocm-mi300/162288 2025-09-07T06:13:53.3404871Z * [new tag] ciflow/rocm-mi355/162053 -> ciflow/rocm-mi355/162053 2025-09-07T06:13:53.3405590Z * [new tag] ciflow/rocm-mi355/162056 -> ciflow/rocm-mi355/162056 2025-09-07T06:13:53.3406509Z * [new tag] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-09-07T06:13:53.3407145Z * [new tag] ciflow/rocm/154170 -> ciflow/rocm/154170 2025-09-07T06:13:53.3408155Z * [new tag] ciflow/rocm/156491 -> ciflow/rocm/156491 2025-09-07T06:13:53.3408800Z * [new tag] ciflow/rocm/156592 -> ciflow/rocm/156592 2025-09-07T06:13:53.3409548Z * [new tag] ciflow/rocm/158747 -> ciflow/rocm/158747 2025-09-07T06:13:53.3410258Z * [new tag] ciflow/rocm/159146 -> ciflow/rocm/159146 2025-09-07T06:13:53.3411282Z * [new tag] ciflow/rocm/159158 -> ciflow/rocm/159158 2025-09-07T06:13:53.3412426Z * [new tag] ciflow/rocm/161715 -> ciflow/rocm/161715 2025-09-07T06:13:53.3413390Z * [new tag] ciflow/rocm/161972 -> ciflow/rocm/161972 2025-09-07T06:13:53.3414095Z * [new tag] ciflow/rocm/162052 -> ciflow/rocm/162052 2025-09-07T06:13:53.3415383Z * [new tag] ciflow/rocm/162053 -> ciflow/rocm/162053 2025-09-07T06:13:53.3416339Z * [new tag] ciflow/rocm/162056 -> ciflow/rocm/162056 2025-09-07T06:13:53.3417376Z * [new tag] ciflow/rocm/162112 -> ciflow/rocm/162112 2025-09-07T06:13:53.3418321Z * [new tag] ciflow/rocm/162278 -> ciflow/rocm/162278 2025-09-07T06:13:53.3419032Z * [new tag] ciflow/rocm/162288 -> ciflow/rocm/162288 2025-09-07T06:13:53.3419808Z * [new tag] ciflow/rocm/162305 -> ciflow/rocm/162305 2025-09-07T06:13:53.3420987Z * [new tag] ciflow/slow/01c7106 -> ciflow/slow/01c7106 2025-09-07T06:13:53.3421806Z * [new tag] ciflow/slow/0577043 -> ciflow/slow/0577043 2025-09-07T06:13:53.3423298Z * [new tag] ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym 2025-09-07T06:13:53.3423852Z * [new tag] ciflow/slow/0e81104 -> ciflow/slow/0e81104 2025-09-07T06:13:53.3424603Z * [new tag] ciflow/slow/161395 -> ciflow/slow/161395 2025-09-07T06:13:53.3425557Z * [new tag] ciflow/slow/1732077 -> ciflow/slow/1732077 2025-09-07T06:13:53.3426501Z * [new tag] ciflow/slow/187eb7c -> ciflow/slow/187eb7c 2025-09-07T06:13:53.3427425Z * [new tag] ciflow/slow/1faef89 -> ciflow/slow/1faef89 2025-09-07T06:13:53.3428557Z * [new tag] ciflow/slow/3920ec1 -> ciflow/slow/3920ec1 2025-09-07T06:13:53.3429625Z * [new tag] ciflow/slow/3b7c6b2 -> ciflow/slow/3b7c6b2 2025-09-07T06:13:53.3430636Z * [new tag] ciflow/slow/59a3759 -> ciflow/slow/59a3759 2025-09-07T06:13:53.3431428Z * [new tag] ciflow/slow/70ef0bb -> ciflow/slow/70ef0bb 2025-09-07T06:13:53.3432361Z * [new tag] ciflow/slow/788ff06 -> ciflow/slow/788ff06 2025-09-07T06:13:53.3433808Z * [new tag] ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym 2025-09-07T06:13:53.3434300Z * [new tag] ciflow/slow/9d85864 -> ciflow/slow/9d85864 2025-09-07T06:13:53.3435254Z * [new tag] ciflow/slow/9ffad5b -> ciflow/slow/9ffad5b 2025-09-07T06:13:53.3436113Z * [new tag] ciflow/slow/a206e8b -> ciflow/slow/a206e8b 2025-09-07T06:13:53.3437083Z * [new tag] ciflow/slow/a837609 -> ciflow/slow/a837609 2025-09-07T06:13:53.3438054Z * [new tag] ciflow/slow/af841f3 -> ciflow/slow/af841f3 2025-09-07T06:13:53.3439489Z * [new tag] ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym 2025-09-07T06:13:53.3440022Z * [new tag] ciflow/triton_binaries/162329 -> ciflow/triton_binaries/162329 2025-09-07T06:13:53.3440933Z * [new tag] ciflow/trunk/113258 -> ciflow/trunk/113258 2025-09-07T06:13:53.3441687Z * [new tag] ciflow/trunk/137400 -> ciflow/trunk/137400 2025-09-07T06:13:53.3442363Z * [new tag] ciflow/trunk/148180 -> ciflow/trunk/148180 2025-09-07T06:13:53.3443068Z * [new tag] ciflow/trunk/148328 -> ciflow/trunk/148328 2025-09-07T06:13:53.3443778Z * [new tag] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-09-07T06:13:53.3444945Z * [new tag] ciflow/trunk/148919 -> ciflow/trunk/148919 2025-09-07T06:13:53.3445596Z * [new tag] ciflow/trunk/152624 -> ciflow/trunk/152624 2025-09-07T06:13:53.3446328Z * [new tag] ciflow/trunk/154170 -> ciflow/trunk/154170 2025-09-07T06:13:53.3447028Z * [new tag] ciflow/trunk/154694 -> ciflow/trunk/154694 2025-09-07T06:13:53.3447776Z * [new tag] ciflow/trunk/156049 -> ciflow/trunk/156049 2025-09-07T06:13:53.3448452Z * [new tag] ciflow/trunk/156703 -> ciflow/trunk/156703 2025-09-07T06:13:53.3449917Z * [new tag] ciflow/trunk/156711 -> ciflow/trunk/156711 2025-09-07T06:13:53.3450978Z * [new tag] ciflow/trunk/157432 -> ciflow/trunk/157432 2025-09-07T06:13:53.3452045Z * [new tag] ciflow/trunk/157685 -> ciflow/trunk/157685 2025-09-07T06:13:53.3452766Z * [new tag] ciflow/trunk/157689 -> ciflow/trunk/157689 2025-09-07T06:13:53.3453563Z * [new tag] ciflow/trunk/157699 -> ciflow/trunk/157699 2025-09-07T06:13:53.3454323Z * [new tag] ciflow/trunk/157813 -> ciflow/trunk/157813 2025-09-07T06:13:53.3455103Z * [new tag] ciflow/trunk/157994 -> ciflow/trunk/157994 2025-09-07T06:13:53.3455887Z * [new tag] ciflow/trunk/158091 -> ciflow/trunk/158091 2025-09-07T06:13:53.3456648Z * [new tag] ciflow/trunk/158104 -> ciflow/trunk/158104 2025-09-07T06:13:53.3457906Z * [new tag] ciflow/trunk/158404 -> ciflow/trunk/158404 2025-09-07T06:13:53.3458719Z * [new tag] ciflow/trunk/158647 -> ciflow/trunk/158647 2025-09-07T06:13:53.3459747Z * [new tag] ciflow/trunk/158846 -> ciflow/trunk/158846 2025-09-07T06:13:53.3460462Z * [new tag] ciflow/trunk/159158 -> ciflow/trunk/159158 2025-09-07T06:13:53.3461430Z * [new tag] ciflow/trunk/159682 -> ciflow/trunk/159682 2025-09-07T06:13:53.3462304Z * [new tag] ciflow/trunk/159835 -> ciflow/trunk/159835 2025-09-07T06:13:53.3463070Z * [new tag] ciflow/trunk/160161 -> ciflow/trunk/160161 2025-09-07T06:13:53.3463882Z * [new tag] ciflow/trunk/160236 -> ciflow/trunk/160236 2025-09-07T06:13:53.3464628Z * [new tag] ciflow/trunk/160329 -> ciflow/trunk/160329 2025-09-07T06:13:53.3465432Z * [new tag] ciflow/trunk/160480 -> ciflow/trunk/160480 2025-09-07T06:13:53.3466148Z * [new tag] ciflow/trunk/160532 -> ciflow/trunk/160532 2025-09-07T06:13:53.3466989Z * [new tag] ciflow/trunk/160836 -> ciflow/trunk/160836 2025-09-07T06:13:53.3467759Z * [new tag] ciflow/trunk/160843 -> ciflow/trunk/160843 2025-09-07T06:13:53.3468516Z * [new tag] ciflow/trunk/160869 -> ciflow/trunk/160869 2025-09-07T06:13:53.3469502Z * [new tag] ciflow/trunk/160940 -> ciflow/trunk/160940 2025-09-07T06:13:53.3470200Z * [new tag] ciflow/trunk/160943 -> ciflow/trunk/160943 2025-09-07T06:13:53.3471206Z * [new tag] ciflow/trunk/160953 -> ciflow/trunk/160953 2025-09-07T06:13:53.3472068Z * [new tag] ciflow/trunk/161035 -> ciflow/trunk/161035 2025-09-07T06:13:53.3472819Z * [new tag] ciflow/trunk/161178 -> ciflow/trunk/161178 2025-09-07T06:13:53.3473570Z * [new tag] ciflow/trunk/161349 -> ciflow/trunk/161349 2025-09-07T06:13:53.3474340Z * [new tag] ciflow/trunk/161350 -> ciflow/trunk/161350 2025-09-07T06:13:53.3475119Z * [new tag] ciflow/trunk/161351 -> ciflow/trunk/161351 2025-09-07T06:13:53.3475833Z * [new tag] ciflow/trunk/161395 -> ciflow/trunk/161395 2025-09-07T06:13:53.3476587Z * [new tag] ciflow/trunk/161405 -> ciflow/trunk/161405 2025-09-07T06:13:53.3477338Z * [new tag] ciflow/trunk/161406 -> ciflow/trunk/161406 2025-09-07T06:13:53.3478097Z * [new tag] ciflow/trunk/161410 -> ciflow/trunk/161410 2025-09-07T06:13:53.3478846Z * [new tag] ciflow/trunk/161468 -> ciflow/trunk/161468 2025-09-07T06:13:53.3479597Z * [new tag] ciflow/trunk/161499 -> ciflow/trunk/161499 2025-09-07T06:13:53.3480687Z * [new tag] ciflow/trunk/161527 -> ciflow/trunk/161527 2025-09-07T06:13:53.3481393Z * [new tag] ciflow/trunk/161534 -> ciflow/trunk/161534 2025-09-07T06:13:53.3482146Z * [new tag] ciflow/trunk/161591 -> ciflow/trunk/161591 2025-09-07T06:13:53.3482906Z * [new tag] ciflow/trunk/161595 -> ciflow/trunk/161595 2025-09-07T06:13:53.3483646Z * [new tag] ciflow/trunk/161596 -> ciflow/trunk/161596 2025-09-07T06:13:53.3484407Z * [new tag] ciflow/trunk/161633 -> ciflow/trunk/161633 2025-09-07T06:13:53.3485147Z * [new tag] ciflow/trunk/161634 -> ciflow/trunk/161634 2025-09-07T06:13:53.3485921Z * [new tag] ciflow/trunk/161635 -> ciflow/trunk/161635 2025-09-07T06:13:53.3486666Z * [new tag] ciflow/trunk/161667 -> ciflow/trunk/161667 2025-09-07T06:13:53.3487408Z * [new tag] ciflow/trunk/161670 -> ciflow/trunk/161670 2025-09-07T06:13:53.3488158Z * [new tag] ciflow/trunk/161692 -> ciflow/trunk/161692 2025-09-07T06:13:53.3488899Z * [new tag] ciflow/trunk/161693 -> ciflow/trunk/161693 2025-09-07T06:13:53.3489692Z * [new tag] ciflow/trunk/161695 -> ciflow/trunk/161695 2025-09-07T06:13:53.3490464Z * [new tag] ciflow/trunk/161730 -> ciflow/trunk/161730 2025-09-07T06:13:53.3491207Z * [new tag] ciflow/trunk/161744 -> ciflow/trunk/161744 2025-09-07T06:13:53.3492547Z * [new tag] ciflow/trunk/161749 -> ciflow/trunk/161749 2025-09-07T06:13:53.3493187Z * [new tag] ciflow/trunk/161881 -> ciflow/trunk/161881 2025-09-07T06:13:53.3493965Z * [new tag] ciflow/trunk/161924 -> ciflow/trunk/161924 2025-09-07T06:13:53.3495033Z * [new tag] ciflow/trunk/161926 -> ciflow/trunk/161926 2025-09-07T06:13:53.3495753Z * [new tag] ciflow/trunk/161936 -> ciflow/trunk/161936 2025-09-07T06:13:53.3496540Z * [new tag] ciflow/trunk/161952 -> ciflow/trunk/161952 2025-09-07T06:13:53.3497324Z * [new tag] ciflow/trunk/161955 -> ciflow/trunk/161955 2025-09-07T06:13:53.3498085Z * [new tag] ciflow/trunk/161957 -> ciflow/trunk/161957 2025-09-07T06:13:53.3498859Z * [new tag] ciflow/trunk/161959 -> ciflow/trunk/161959 2025-09-07T06:13:53.3499650Z * [new tag] ciflow/trunk/161977 -> ciflow/trunk/161977 2025-09-07T06:13:53.3500427Z * [new tag] ciflow/trunk/161988 -> ciflow/trunk/161988 2025-09-07T06:13:53.3501193Z * [new tag] ciflow/trunk/161994 -> ciflow/trunk/161994 2025-09-07T06:13:53.3502112Z * [new tag] ciflow/trunk/162007 -> ciflow/trunk/162007 2025-09-07T06:13:53.3502880Z * [new tag] ciflow/trunk/162013 -> ciflow/trunk/162013 2025-09-07T06:13:53.3503762Z * [new tag] ciflow/trunk/162017 -> ciflow/trunk/162017 2025-09-07T06:13:53.3504541Z * [new tag] ciflow/trunk/162021 -> ciflow/trunk/162021 2025-09-07T06:13:53.3505283Z * [new tag] ciflow/trunk/162022 -> ciflow/trunk/162022 2025-09-07T06:13:53.3506035Z * [new tag] ciflow/trunk/162040 -> ciflow/trunk/162040 2025-09-07T06:13:53.3506782Z * [new tag] ciflow/trunk/162041 -> ciflow/trunk/162041 2025-09-07T06:13:53.3507733Z * [new tag] ciflow/trunk/162062 -> ciflow/trunk/162062 2025-09-07T06:13:53.3508441Z * [new tag] ciflow/trunk/162066 -> ciflow/trunk/162066 2025-09-07T06:13:53.3509651Z * [new tag] ciflow/trunk/162089 -> ciflow/trunk/162089 2025-09-07T06:13:53.3510384Z * [new tag] ciflow/trunk/162099 -> ciflow/trunk/162099 2025-09-07T06:13:53.3511142Z * [new tag] ciflow/trunk/162104 -> ciflow/trunk/162104 2025-09-07T06:13:53.3511943Z * [new tag] ciflow/trunk/162106 -> ciflow/trunk/162106 2025-09-07T06:13:53.3512703Z * [new tag] ciflow/trunk/162112 -> ciflow/trunk/162112 2025-09-07T06:13:53.3513474Z * [new tag] ciflow/trunk/162119 -> ciflow/trunk/162119 2025-09-07T06:13:53.3514203Z * [new tag] ciflow/trunk/162142 -> ciflow/trunk/162142 2025-09-07T06:13:53.3514984Z * [new tag] ciflow/trunk/162169 -> ciflow/trunk/162169 2025-09-07T06:13:53.3515743Z * [new tag] ciflow/trunk/162183 -> ciflow/trunk/162183 2025-09-07T06:13:53.3516492Z * [new tag] ciflow/trunk/162190 -> ciflow/trunk/162190 2025-09-07T06:13:53.3517240Z * [new tag] ciflow/trunk/162194 -> ciflow/trunk/162194 2025-09-07T06:13:53.3518203Z * [new tag] ciflow/trunk/162200 -> ciflow/trunk/162200 2025-09-07T06:13:53.3518898Z * [new tag] ciflow/trunk/162206 -> ciflow/trunk/162206 2025-09-07T06:13:53.3519646Z * [new tag] ciflow/trunk/162208 -> ciflow/trunk/162208 2025-09-07T06:13:53.3520473Z * [new tag] ciflow/trunk/162222 -> ciflow/trunk/162222 2025-09-07T06:13:53.3521210Z * [new tag] ciflow/trunk/162238 -> ciflow/trunk/162238 2025-09-07T06:13:53.3522029Z * [new tag] ciflow/trunk/162244 -> ciflow/trunk/162244 2025-09-07T06:13:53.3523046Z * [new tag] ciflow/trunk/162267 -> ciflow/trunk/162267 2025-09-07T06:13:53.3523850Z * [new tag] ciflow/trunk/162269 -> ciflow/trunk/162269 2025-09-07T06:13:53.3524609Z * [new tag] ciflow/trunk/162278 -> ciflow/trunk/162278 2025-09-07T06:13:53.3525375Z * [new tag] ciflow/trunk/162286 -> ciflow/trunk/162286 2025-09-07T06:13:53.3526384Z * [new tag] ciflow/trunk/162288 -> ciflow/trunk/162288 2025-09-07T06:13:53.3527092Z * [new tag] ciflow/trunk/162293 -> ciflow/trunk/162293 2025-09-07T06:13:53.3527833Z * [new tag] ciflow/trunk/162310 -> ciflow/trunk/162310 2025-09-07T06:13:53.3528579Z * [new tag] ciflow/trunk/162311 -> ciflow/trunk/162311 2025-09-07T06:13:53.3529339Z * [new tag] ciflow/trunk/162315 -> ciflow/trunk/162315 2025-09-07T06:13:53.3530095Z * [new tag] ciflow/trunk/162325 -> ciflow/trunk/162325 2025-09-07T06:13:53.3531066Z * [new tag] ciflow/trunk/162328 -> ciflow/trunk/162328 2025-09-07T06:13:53.3532087Z * [new tag] ciflow/trunk/162329 -> ciflow/trunk/162329 2025-09-07T06:13:53.3533342Z * [new tag] ciflow/unstable/123 -> ciflow/unstable/123 2025-09-07T06:13:53.3534201Z * [new tag] ciflow/vllm/162292 -> ciflow/vllm/162292 2025-09-07T06:13:53.3535213Z * [new tag] ciflow/win-arm64/156049 -> ciflow/win-arm64/156049 2025-09-07T06:13:53.3535915Z * [new tag] ciflow/win-arm64/158104 -> ciflow/win-arm64/158104 2025-09-07T06:13:53.3536779Z * [new tag] ciflow/xpu/157699 -> ciflow/xpu/157699 2025-09-07T06:13:53.3537507Z * [new tag] ciflow/xpu/157994 -> ciflow/xpu/157994 2025-09-07T06:13:53.3538460Z * [new tag] ciflow/xpu/159459 -> ciflow/xpu/159459 2025-09-07T06:13:53.3539237Z * [new tag] ciflow/xpu/159718 -> ciflow/xpu/159718 2025-09-07T06:13:53.3539882Z * [new tag] ciflow/xpu/159944 -> ciflow/xpu/159944 2025-09-07T06:13:53.3540720Z * [new tag] ciflow/xpu/160867 -> ciflow/xpu/160867 2025-09-07T06:13:53.3541720Z * [new tag] ciflow/xpu/160938 -> ciflow/xpu/160938 2025-09-07T06:13:53.3542436Z * [new tag] ciflow/xpu/160940 -> ciflow/xpu/160940 2025-09-07T06:13:53.3543160Z * [new tag] ciflow/xpu/160953 -> ciflow/xpu/160953 2025-09-07T06:13:53.3544106Z * [new tag] ciflow/xpu/161045 -> ciflow/xpu/161045 2025-09-07T06:13:53.3545141Z * [new tag] ciflow/xpu/161058 -> ciflow/xpu/161058 2025-09-07T06:13:53.3546309Z * [new tag] ciflow/xpu/161246 -> ciflow/xpu/161246 2025-09-07T06:13:53.3547288Z * [new tag] ciflow/xpu/161397 -> ciflow/xpu/161397 2025-09-07T06:13:53.3548224Z * [new tag] ciflow/xpu/161485 -> ciflow/xpu/161485 2025-09-07T06:13:53.3549079Z * [new tag] ciflow/xpu/161988 -> ciflow/xpu/161988 2025-09-07T06:13:53.3552897Z * [new tag] ciflow/xpu/162062 -> ciflow/xpu/162062 2025-09-07T06:13:53.3553844Z * [new tag] cslpull75 -> cslpull75 2025-09-07T06:13:53.3554654Z * [new tag] cslpull76 -> cslpull76 2025-09-07T06:13:53.3555495Z * [new tag] cslpull77 -> cslpull77 2025-09-07T06:13:53.3556431Z * [new tag] cslpull78 -> cslpull78 2025-09-07T06:13:53.3557529Z * [new tag] cslpull79 -> cslpull79 2025-09-07T06:13:53.3558741Z * [new tag] cslpull80 -> cslpull80 2025-09-07T06:13:53.3559801Z * [new tag] cslpull81 -> cslpull81 2025-09-07T06:13:53.3560618Z * [new tag] cslpull82 -> cslpull82 2025-09-07T06:13:53.3561635Z * [new tag] cslpull83 -> cslpull83 2025-09-07T06:13:53.3562444Z * [new tag] cslpull84 -> cslpull84 2025-09-07T06:13:53.3563384Z * [new tag] cslpull85 -> cslpull85 2025-09-07T06:13:53.3564369Z * [new tag] cslpull86 -> cslpull86 2025-09-07T06:13:53.3565167Z * [new tag] cslpull87 -> cslpull87 2025-09-07T06:13:53.3566112Z * [new tag] cslpull88 -> cslpull88 2025-09-07T06:13:53.3567006Z * [new tag] cslpull89 -> cslpull89 2025-09-07T06:13:53.3567639Z * [new tag] cslpull90 -> cslpull90 2025-09-07T06:13:53.3568994Z * [new tag] cslpull91 -> cslpull91 2025-09-07T06:13:53.3569786Z * [new tag] cslpull92 -> cslpull92 2025-09-07T06:13:53.3570710Z * [new tag] flight_5 -> flight_5 2025-09-07T06:13:53.3572008Z * [new tag] flight_5.1 -> flight_5.1 2025-09-07T06:13:53.3572979Z * [new tag] flight_5.2 -> flight_5.2 2025-09-07T06:13:53.3573760Z * [new tag] flight_5.3 -> flight_5.3 2025-09-07T06:13:53.3574782Z * [new tag] forpull1 -> forpull1 2025-09-07T06:13:53.3575936Z * [new tag] malfet/tag-2ef5611 -> malfet/tag-2ef5611 2025-09-07T06:13:53.3577266Z * [new tag] malfet/tag-317b1a0 -> malfet/tag-317b1a0 2025-09-07T06:13:53.3578223Z * [new tag] malfet/tag-ec6f767 -> malfet/tag-ec6f767 2025-09-07T06:13:53.3579218Z * [new tag] nightly-binary -> nightly-binary 2025-09-07T06:13:53.3580039Z * [new tag] sqzhang_flight4_plus -> sqzhang_flight4_plus 2025-09-07T06:13:53.3580995Z * [new tag] sqzhang_flight_3 -> sqzhang_flight_3 2025-09-07T06:13:53.3582375Z * [new tag] trunk/00636e0171e7e733628c408084805442270cf608 -> trunk/00636e0171e7e733628c408084805442270cf608 2025-09-07T06:13:53.3583318Z * [new tag] trunk/019fed39aa6b2dd8c69347378d53423e5efae8d4 -> trunk/019fed39aa6b2dd8c69347378d53423e5efae8d4 2025-09-07T06:13:53.3584660Z * [new tag] trunk/01ab325cc2e0dc221af4d710974e1b9175066544 -> trunk/01ab325cc2e0dc221af4d710974e1b9175066544 2025-09-07T06:13:53.3585709Z * [new tag] trunk/01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b -> trunk/01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b 2025-09-07T06:13:53.3586641Z * [new tag] trunk/040d00af048967dde7938d358d7f5988cbd18388 -> trunk/040d00af048967dde7938d358d7f5988cbd18388 2025-09-07T06:13:53.3587642Z * [new tag] trunk/0447f2d99b4351b2ff129dce6eebb371024f73e5 -> trunk/0447f2d99b4351b2ff129dce6eebb371024f73e5 2025-09-07T06:13:53.3588602Z * [new tag] trunk/047603d35bdc70046216384838d6340feab79bf4 -> trunk/047603d35bdc70046216384838d6340feab79bf4 2025-09-07T06:13:53.3589584Z * [new tag] trunk/06da7c0730b3764f178ec3a90dedf4ffa4202d81 -> trunk/06da7c0730b3764f178ec3a90dedf4ffa4202d81 2025-09-07T06:13:53.3590624Z * [new tag] trunk/081cab045472ce045634548cc6c14a4870641e23 -> trunk/081cab045472ce045634548cc6c14a4870641e23 2025-09-07T06:13:53.3591569Z * [new tag] trunk/09587daf8c9f21f5340f73921ce5f23d1a4a4572 -> trunk/09587daf8c9f21f5340f73921ce5f23d1a4a4572 2025-09-07T06:13:53.3592474Z * [new tag] trunk/09be1890d72cc34fc946965dc4a27736bf0ca8c6 -> trunk/09be1890d72cc34fc946965dc4a27736bf0ca8c6 2025-09-07T06:13:53.3593425Z * [new tag] trunk/09d2f1b6315d6d416fbf452793d65795863ebc66 -> trunk/09d2f1b6315d6d416fbf452793d65795863ebc66 2025-09-07T06:13:53.3594255Z * [new tag] trunk/0af70e2353e1dcda83175fd4834ecb7b63e009e0 -> trunk/0af70e2353e1dcda83175fd4834ecb7b63e009e0 2025-09-07T06:13:53.3595903Z * [new tag] trunk/0c0e056a9e20c17271a6144dd32c0c7e3ba26736 -> trunk/0c0e056a9e20c17271a6144dd32c0c7e3ba26736 2025-09-07T06:13:53.3597258Z * [new tag] trunk/0cd6c56bdfa9178ff61be82ce3b178926ddb64a9 -> trunk/0cd6c56bdfa9178ff61be82ce3b178926ddb64a9 2025-09-07T06:13:53.3598098Z * [new tag] trunk/0d421ace32c1605ee8e452ee1eeb03bd243dd96c -> trunk/0d421ace32c1605ee8e452ee1eeb03bd243dd96c 2025-09-07T06:13:53.3599287Z * [new tag] trunk/0d71a9dd5b4b6d1dde58d91c9b71d96bc6a6a171 -> trunk/0d71a9dd5b4b6d1dde58d91c9b71d96bc6a6a171 2025-09-07T06:13:53.3600113Z * [new tag] trunk/0d84ff3b78f55492d3d4708458c92d776274939e -> trunk/0d84ff3b78f55492d3d4708458c92d776274939e 2025-09-07T06:13:53.3600968Z * [new tag] trunk/0f45aaf4414048b17d720d0915ce221a8de8ec63 -> trunk/0f45aaf4414048b17d720d0915ce221a8de8ec63 2025-09-07T06:13:53.3601903Z * [new tag] trunk/0ff8eabf1387de5acd6712a03bda61f1a3dfa27f -> trunk/0ff8eabf1387de5acd6712a03bda61f1a3dfa27f 2025-09-07T06:13:53.3602825Z * [new tag] trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f -> trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f 2025-09-07T06:13:53.3603773Z * [new tag] trunk/12814701555d3e41dfcdf8f9273af5821e322df0 -> trunk/12814701555d3e41dfcdf8f9273af5821e322df0 2025-09-07T06:13:53.3604712Z * [new tag] trunk/13b65196db422bdb394cb482e208c61ed448898c -> trunk/13b65196db422bdb394cb482e208c61ed448898c 2025-09-07T06:13:53.3605690Z * [new tag] trunk/13d66e2a66eceed14b8a8f5a971087df4f688a46 -> trunk/13d66e2a66eceed14b8a8f5a971087df4f688a46 2025-09-07T06:13:53.3606652Z * [new tag] trunk/145a3a7bda15e3963a33eb1b54bba5d4a270b225 -> trunk/145a3a7bda15e3963a33eb1b54bba5d4a270b225 2025-09-07T06:13:53.3607566Z * [new tag] trunk/146371483318e17929daefd37c8e459d9d6d47bb -> trunk/146371483318e17929daefd37c8e459d9d6d47bb 2025-09-07T06:13:53.3608499Z * [new tag] trunk/15c77a8cfd341e74fd124b077492ef2bfa51b339 -> trunk/15c77a8cfd341e74fd124b077492ef2bfa51b339 2025-09-07T06:13:53.3609423Z * [new tag] trunk/17fa8eec4a1e32939ab4d364ee6e75487a79b654 -> trunk/17fa8eec4a1e32939ab4d364ee6e75487a79b654 2025-09-07T06:13:53.3611068Z * [new tag] trunk/190c391a28845a14df26abb228d26aa813efb20c -> trunk/190c391a28845a14df26abb228d26aa813efb20c 2025-09-07T06:13:53.3612295Z * [new tag] trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 -> trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 2025-09-07T06:13:53.3613307Z * [new tag] trunk/1aa7476885e8f6e7b0ec3a5b6383aad9d3f343e7 -> trunk/1aa7476885e8f6e7b0ec3a5b6383aad9d3f343e7 2025-09-07T06:13:53.3614086Z * [new tag] trunk/1aeb421c342c9e9607842f4c87cb46e8e816ee53 -> trunk/1aeb421c342c9e9607842f4c87cb46e8e816ee53 2025-09-07T06:13:53.3615041Z * [new tag] trunk/1c1b28d5b6a942fafe23b2f09302d93c25226d4a -> trunk/1c1b28d5b6a942fafe23b2f09302d93c25226d4a 2025-09-07T06:13:53.3615969Z * [new tag] trunk/1ebd70d0c0d562d3be9abdee2a21906584af7d99 -> trunk/1ebd70d0c0d562d3be9abdee2a21906584af7d99 2025-09-07T06:13:53.3616882Z * [new tag] trunk/1ec2c15914da4ef7bd926ed9aebc8671c75fe965 -> trunk/1ec2c15914da4ef7bd926ed9aebc8671c75fe965 2025-09-07T06:13:53.3617828Z * [new tag] trunk/1f51056bd64e73d1aa81321bc3c098575b1bc78a -> trunk/1f51056bd64e73d1aa81321bc3c098575b1bc78a 2025-09-07T06:13:53.3618796Z * [new tag] trunk/1f820de639c75a1562d3fb03f160439f853ae07b -> trunk/1f820de639c75a1562d3fb03f160439f853ae07b 2025-09-07T06:13:53.3619752Z * [new tag] trunk/204697f0e695d82894c5010fbec664c4391f90cc -> trunk/204697f0e695d82894c5010fbec664c4391f90cc 2025-09-07T06:13:53.3620773Z * [new tag] trunk/20629b1619fe636227d01fc85ba221daa7185a05 -> trunk/20629b1619fe636227d01fc85ba221daa7185a05 2025-09-07T06:13:53.3621625Z * [new tag] trunk/20b47acef845e9c4f71da9429a396d293f50ebe7 -> trunk/20b47acef845e9c4f71da9429a396d293f50ebe7 2025-09-07T06:13:53.3622607Z * [new tag] trunk/20bfb2539d7c5250379648eda35f80b8a7d642dd -> trunk/20bfb2539d7c5250379648eda35f80b8a7d642dd 2025-09-07T06:13:53.3623588Z * [new tag] trunk/21fae99c180d17def562797ea0fb154d8fdf88e3 -> trunk/21fae99c180d17def562797ea0fb154d8fdf88e3 2025-09-07T06:13:53.3624577Z * [new tag] trunk/248355faf53f9f7ba2fd0a367d59600c6d991e7f -> trunk/248355faf53f9f7ba2fd0a367d59600c6d991e7f 2025-09-07T06:13:53.3625461Z * [new tag] trunk/25f4aaed9ec26f39c13862323ff8582006473d23 -> trunk/25f4aaed9ec26f39c13862323ff8582006473d23 2025-09-07T06:13:53.3626665Z * [new tag] trunk/261a84a1764412f8e659c956e3f81997ec3de9d5 -> trunk/261a84a1764412f8e659c956e3f81997ec3de9d5 2025-09-07T06:13:53.3627652Z * [new tag] trunk/28f4ab0737937858730f29f5c4e601e109cf9d5f -> trunk/28f4ab0737937858730f29f5c4e601e109cf9d5f 2025-09-07T06:13:53.3628600Z * [new tag] trunk/291cd11f2d5df6f48d348cce0e4e762f274f4dc4 -> trunk/291cd11f2d5df6f48d348cce0e4e762f274f4dc4 2025-09-07T06:13:53.3629550Z * [new tag] trunk/29280864d941e6108ab57f7298f520c0cf9696e9 -> trunk/29280864d941e6108ab57f7298f520c0cf9696e9 2025-09-07T06:13:53.3630451Z * [new tag] trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 -> trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 2025-09-07T06:13:53.3631451Z * [new tag] trunk/2a5c0785e2f975697fd7bdf1411de6e03dcaa1ef -> trunk/2a5c0785e2f975697fd7bdf1411de6e03dcaa1ef 2025-09-07T06:13:53.3632385Z * [new tag] trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c -> trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c 2025-09-07T06:13:53.3633303Z * [new tag] trunk/2ba65472dd54488a86a50326ea990195fc6732d6 -> trunk/2ba65472dd54488a86a50326ea990195fc6732d6 2025-09-07T06:13:53.3634174Z * [new tag] trunk/2c03f0acc53ed13fe8ebfe809129f25996e009a0 -> trunk/2c03f0acc53ed13fe8ebfe809129f25996e009a0 2025-09-07T06:13:53.3635044Z * [new tag] trunk/2dd529df0092799f68ee7afcf52338276906706a -> trunk/2dd529df0092799f68ee7afcf52338276906706a 2025-09-07T06:13:53.3636009Z * [new tag] trunk/2f6b4b1ad3f82bb3bd984f6e65744ea339ffb8b5 -> trunk/2f6b4b1ad3f82bb3bd984f6e65744ea339ffb8b5 2025-09-07T06:13:53.3636959Z * [new tag] trunk/2fa0520a64ed8aa734a56c4d124958f0b5711ca8 -> trunk/2fa0520a64ed8aa734a56c4d124958f0b5711ca8 2025-09-07T06:13:53.3637932Z * [new tag] trunk/302df2ac5dc4222294c09d48804a2dddb8f4bad8 -> trunk/302df2ac5dc4222294c09d48804a2dddb8f4bad8 2025-09-07T06:13:53.3638718Z * [new tag] trunk/33028597bfa2e0178e28c8cce33cb9b3800cac43 -> trunk/33028597bfa2e0178e28c8cce33cb9b3800cac43 2025-09-07T06:13:53.3639581Z * [new tag] trunk/34aa78274d6770086025a967fa63a86830e08176 -> trunk/34aa78274d6770086025a967fa63a86830e08176 2025-09-07T06:13:53.3640464Z * [new tag] trunk/3559c354ce6a14d11fe29fb12fa2747a2f2af449 -> trunk/3559c354ce6a14d11fe29fb12fa2747a2f2af449 2025-09-07T06:13:53.3641233Z * [new tag] trunk/36d207fcaaede0d1e58a5168084c307b32b6fd8b -> trunk/36d207fcaaede0d1e58a5168084c307b32b6fd8b 2025-09-07T06:13:53.3642013Z * [new tag] trunk/377033757ae5ca524ea842f1b0a5f446ed3d8fe0 -> trunk/377033757ae5ca524ea842f1b0a5f446ed3d8fe0 2025-09-07T06:13:53.3642919Z * [new tag] trunk/3771380f83fcac154a7c89ad679311d8c4818287 -> trunk/3771380f83fcac154a7c89ad679311d8c4818287 2025-09-07T06:13:53.3643828Z * [new tag] trunk/3a207816cc569f78863d86c01f2a3d265350e39f -> trunk/3a207816cc569f78863d86c01f2a3d265350e39f 2025-09-07T06:13:53.3644838Z * [new tag] trunk/3a20a20e7065ec927fdd216d4da3b04f879b3c67 -> trunk/3a20a20e7065ec927fdd216d4da3b04f879b3c67 2025-09-07T06:13:53.3645796Z * [new tag] trunk/3bbc2e3e4f025523eaa5dbff220b3e96bca608d0 -> trunk/3bbc2e3e4f025523eaa5dbff220b3e96bca608d0 2025-09-07T06:13:53.3646688Z * [new tag] trunk/3c0ff1b569c45cfa6935ad8031a9d4cf1551aa3f -> trunk/3c0ff1b569c45cfa6935ad8031a9d4cf1551aa3f 2025-09-07T06:13:53.3647574Z * [new tag] trunk/3c45af079afc92a03b03ddf4f9198902ffcf30cf -> trunk/3c45af079afc92a03b03ddf4f9198902ffcf30cf 2025-09-07T06:13:53.3648470Z * [new tag] trunk/3dde5d7f9bf80dd6623a712bc429e9e4302464b5 -> trunk/3dde5d7f9bf80dd6623a712bc429e9e4302464b5 2025-09-07T06:13:53.3649631Z * [new tag] trunk/403a3a393cda7e60f503f3b04b8805a845dcf45d -> trunk/403a3a393cda7e60f503f3b04b8805a845dcf45d 2025-09-07T06:13:53.3650843Z * [new tag] trunk/420c52ecf36f86d32da0853bfbe074b682b070aa -> trunk/420c52ecf36f86d32da0853bfbe074b682b070aa 2025-09-07T06:13:53.3651859Z * [new tag] trunk/43b7c86a2c0f91320f5c5f4827b111edff06fdb6 -> trunk/43b7c86a2c0f91320f5c5f4827b111edff06fdb6 2025-09-07T06:13:53.3652825Z * [new tag] trunk/451ed931562ec8b46d1f7e6c266a68132a119336 -> trunk/451ed931562ec8b46d1f7e6c266a68132a119336 2025-09-07T06:13:53.3653757Z * [new tag] trunk/480c7391126656154318fabf1d57ebc01e196e63 -> trunk/480c7391126656154318fabf1d57ebc01e196e63 2025-09-07T06:13:53.3654774Z * [new tag] trunk/48bedd753da22634aa94fbafeb731e82025404f3 -> trunk/48bedd753da22634aa94fbafeb731e82025404f3 2025-09-07T06:13:53.3655597Z * [new tag] trunk/494878a11b79071ada0b98f34042d47155be6d1c -> trunk/494878a11b79071ada0b98f34042d47155be6d1c 2025-09-07T06:13:53.3656591Z * [new tag] trunk/4ae57d448c0a7d37e4cfd5c27d977fad2cef4051 -> trunk/4ae57d448c0a7d37e4cfd5c27d977fad2cef4051 2025-09-07T06:13:53.3657584Z * [new tag] trunk/4cdaf8265d86f984254b62052da8c26ef61ef1cf -> trunk/4cdaf8265d86f984254b62052da8c26ef61ef1cf 2025-09-07T06:13:53.3658959Z * [new tag] trunk/4d4abec80f03cd8fdefe1d9cb3a60d3690cd777e -> trunk/4d4abec80f03cd8fdefe1d9cb3a60d3690cd777e 2025-09-07T06:13:53.3661227Z * [new tag] trunk/4e42aa8ffc44b8340eb0eeaf80a2cafc4763a186 -> trunk/4e42aa8ffc44b8340eb0eeaf80a2cafc4763a186 2025-09-07T06:13:53.3661662Z * [new tag] trunk/4f72d932feee0749397fec876dcd43994f50b215 -> trunk/4f72d932feee0749397fec876dcd43994f50b215 2025-09-07T06:13:53.3662337Z * [new tag] trunk/50fc22dedf3c4a27be61fa05551c4f320281b42d -> trunk/50fc22dedf3c4a27be61fa05551c4f320281b42d 2025-09-07T06:13:53.3662949Z * [new tag] trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 -> trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 2025-09-07T06:13:53.3663838Z * [new tag] trunk/524b78d4f67045b83bb69edc56ab16efe282971c -> trunk/524b78d4f67045b83bb69edc56ab16efe282971c 2025-09-07T06:13:53.3664741Z * [new tag] trunk/54e275e0d81fe1e1ccfa4fb5f2a5a9aaca00ca15 -> trunk/54e275e0d81fe1e1ccfa4fb5f2a5a9aaca00ca15 2025-09-07T06:13:53.3665535Z * [new tag] trunk/5561e45758d59c94605873d5db48ed459c004c3b -> trunk/5561e45758d59c94605873d5db48ed459c004c3b 2025-09-07T06:13:53.3666605Z * [new tag] trunk/57278d45f046d4f89f45d373b1af4dd56934ff24 -> trunk/57278d45f046d4f89f45d373b1af4dd56934ff24 2025-09-07T06:13:53.3667658Z * [new tag] trunk/5927a70934ccf7b70182d364c23245a7dd685503 -> trunk/5927a70934ccf7b70182d364c23245a7dd685503 2025-09-07T06:13:53.3668597Z * [new tag] trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 -> trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 2025-09-07T06:13:53.3669566Z * [new tag] trunk/5a2da090ed6db88bb657c4e51ec0b310cd08bff6 -> trunk/5a2da090ed6db88bb657c4e51ec0b310cd08bff6 2025-09-07T06:13:53.3670672Z * [new tag] trunk/5c473e9f5ee0ef0fc38e6cf34a95b547f8cdc8d5 -> trunk/5c473e9f5ee0ef0fc38e6cf34a95b547f8cdc8d5 2025-09-07T06:13:53.3671410Z * [new tag] trunk/5c67426d6847667a7c55a2dd01f470fa37238c18 -> trunk/5c67426d6847667a7c55a2dd01f470fa37238c18 2025-09-07T06:13:53.3672297Z * [new tag] trunk/5da573c42c332bc68d4b7946c69f690a876d951a -> trunk/5da573c42c332bc68d4b7946c69f690a876d951a 2025-09-07T06:13:53.3673238Z * [new tag] trunk/5e5870e858f60ff4bf87d03f3592097e934a9580 -> trunk/5e5870e858f60ff4bf87d03f3592097e934a9580 2025-09-07T06:13:53.3674132Z * [new tag] trunk/5f3cbc9442aa55b5afb29f4ac8ca9be569003e84 -> trunk/5f3cbc9442aa55b5afb29f4ac8ca9be569003e84 2025-09-07T06:13:53.3675062Z * [new tag] trunk/600c25e9a17fe56e3dee872be8854db08916ba0c -> trunk/600c25e9a17fe56e3dee872be8854db08916ba0c 2025-09-07T06:13:53.3676006Z * [new tag] trunk/601ae8e4831fc8123fffcfb8fd2e6b6381b42e14 -> trunk/601ae8e4831fc8123fffcfb8fd2e6b6381b42e14 2025-09-07T06:13:53.3676919Z * [new tag] trunk/6087ef41e54c2494b117ffd923faf20f515a6806 -> trunk/6087ef41e54c2494b117ffd923faf20f515a6806 2025-09-07T06:13:53.3677861Z * [new tag] trunk/626cb7df8161dd4ecb4fe43b60f37ce9076f56b1 -> trunk/626cb7df8161dd4ecb4fe43b60f37ce9076f56b1 2025-09-07T06:13:53.3678737Z * [new tag] trunk/62c3f9a97fd3dea7132a93066d32d893ffe101e6 -> trunk/62c3f9a97fd3dea7132a93066d32d893ffe101e6 2025-09-07T06:13:53.3679658Z * [new tag] trunk/63a9c23fe99eacfd09610c36dfe8f01b053c1a35 -> trunk/63a9c23fe99eacfd09610c36dfe8f01b053c1a35 2025-09-07T06:13:53.3680620Z * [new tag] trunk/65985937d97505f648b6ed852c3129f2dd08b251 -> trunk/65985937d97505f648b6ed852c3129f2dd08b251 2025-09-07T06:13:53.3682213Z * [new tag] trunk/66f3b4a682a6153517dd23369fdc3289b6494b07 -> trunk/66f3b4a682a6153517dd23369fdc3289b6494b07 2025-09-07T06:13:53.3682907Z * [new tag] trunk/6737e2c996990024187ba620d2764f3b6f6add2c -> trunk/6737e2c996990024187ba620d2764f3b6f6add2c 2025-09-07T06:13:53.3683884Z * [new tag] trunk/67c31dcd364f10072a55f4a30ffd1151c686283a -> trunk/67c31dcd364f10072a55f4a30ffd1151c686283a 2025-09-07T06:13:53.3684833Z * [new tag] trunk/68738beff73e9c3512e18b4edea811a897ce42db -> trunk/68738beff73e9c3512e18b4edea811a897ce42db 2025-09-07T06:13:53.3685779Z * [new tag] trunk/69a25f68884a168550695fdb1a7c310c54d29536 -> trunk/69a25f68884a168550695fdb1a7c310c54d29536 2025-09-07T06:13:53.3686676Z * [new tag] trunk/6b1900c22f1a07b9519346898d4c71d8a2b0f12f -> trunk/6b1900c22f1a07b9519346898d4c71d8a2b0f12f 2025-09-07T06:13:53.3687563Z * [new tag] trunk/6b8b3ac4403f771bd4a8f9a45d93347304148774 -> trunk/6b8b3ac4403f771bd4a8f9a45d93347304148774 2025-09-07T06:13:53.3688505Z * [new tag] trunk/6f7608d603834d6068b2e7a5d59bec3973b6bb1b -> trunk/6f7608d603834d6068b2e7a5d59bec3973b6bb1b 2025-09-07T06:13:53.3689455Z * [new tag] trunk/70d36e047dfb3488fd6335016711a784d810ebda -> trunk/70d36e047dfb3488fd6335016711a784d810ebda 2025-09-07T06:13:53.3690339Z * [new tag] trunk/71992dd805ff9d6763f77214dfe8b0465e88c87b -> trunk/71992dd805ff9d6763f77214dfe8b0465e88c87b 2025-09-07T06:13:53.3691262Z * [new tag] trunk/734ce8eba9c69381f187359bf0fef1d71d84cd20 -> trunk/734ce8eba9c69381f187359bf0fef1d71d84cd20 2025-09-07T06:13:53.3692582Z * [new tag] trunk/73eb4511fb863a37944342b7e92aae706de603c8 -> trunk/73eb4511fb863a37944342b7e92aae706de603c8 2025-09-07T06:13:53.3693633Z * [new tag] trunk/75bc23cfc345bd4c05e7f97c416c4b3d2d1fa64b -> trunk/75bc23cfc345bd4c05e7f97c416c4b3d2d1fa64b 2025-09-07T06:13:53.3694576Z * [new tag] trunk/771f369448321a387f2018535bc8b8b6e5f12fab -> trunk/771f369448321a387f2018535bc8b8b6e5f12fab 2025-09-07T06:13:53.3695579Z * [new tag] trunk/789d4942127143f2adcb53612c058ce4c9a2cf20 -> trunk/789d4942127143f2adcb53612c058ce4c9a2cf20 2025-09-07T06:13:53.3696478Z * [new tag] trunk/791eff96c85678c950888f9da24650083ee673fe -> trunk/791eff96c85678c950888f9da24650083ee673fe 2025-09-07T06:13:53.3697191Z * [new tag] trunk/793fc12aff1f69fbbf9f4278182fb52bbe350fc9 -> trunk/793fc12aff1f69fbbf9f4278182fb52bbe350fc9 2025-09-07T06:13:53.3698191Z * [new tag] trunk/79fcd5247a9a129eee526a14df30bfc6a22b3f01 -> trunk/79fcd5247a9a129eee526a14df30bfc6a22b3f01 2025-09-07T06:13:53.3699146Z * [new tag] trunk/7f4ff79210eb06924f223ae3a1941ee0e2635348 -> trunk/7f4ff79210eb06924f223ae3a1941ee0e2635348 2025-09-07T06:13:53.3700109Z * [new tag] trunk/8076a185c85112be62be292eb47409c88a585b1c -> trunk/8076a185c85112be62be292eb47409c88a585b1c 2025-09-07T06:13:53.3701043Z * [new tag] trunk/80dd397f1979371a5583fa3d5c7352029522a78d -> trunk/80dd397f1979371a5583fa3d5c7352029522a78d 2025-09-07T06:13:53.3701811Z * [new tag] trunk/8171d6052ec12628eb67e0040839314056014429 -> trunk/8171d6052ec12628eb67e0040839314056014429 2025-09-07T06:13:53.3702772Z * [new tag] trunk/81aeefa657b7ccc26b275c50a9f33b2f056e8071 -> trunk/81aeefa657b7ccc26b275c50a9f33b2f056e8071 2025-09-07T06:13:53.3703837Z * [new tag] trunk/81b7b16618bda250ce55982894a83dc0805eb64c -> trunk/81b7b16618bda250ce55982894a83dc0805eb64c 2025-09-07T06:13:53.3704768Z * [new tag] trunk/827f0d405448de31f79d1089f7d7fceab2f87895 -> trunk/827f0d405448de31f79d1089f7d7fceab2f87895 2025-09-07T06:13:53.3705719Z * [new tag] trunk/82f63c8f6de63c30132a8ac299b6e8c2fd0d3fe8 -> trunk/82f63c8f6de63c30132a8ac299b6e8c2fd0d3fe8 2025-09-07T06:13:53.3706771Z * [new tag] trunk/850e1382a9c56bfde18af09d3e72352d775e9435 -> trunk/850e1382a9c56bfde18af09d3e72352d775e9435 2025-09-07T06:13:53.3707773Z * [new tag] trunk/8678d831c48e616b717bff50f2d03141d2e9f965 -> trunk/8678d831c48e616b717bff50f2d03141d2e9f965 2025-09-07T06:13:53.3708764Z * [new tag] trunk/869cbcc16e489a4f5a14a93d5779b0ea86061c60 -> trunk/869cbcc16e489a4f5a14a93d5779b0ea86061c60 2025-09-07T06:13:53.3709752Z * [new tag] trunk/8703debf669bc2238211bfd039f4ecdd8228b7f7 -> trunk/8703debf669bc2238211bfd039f4ecdd8228b7f7 2025-09-07T06:13:53.3710702Z * [new tag] trunk/874069fbe46e82da5cfa405e6c0deb12e89ff608 -> trunk/874069fbe46e82da5cfa405e6c0deb12e89ff608 2025-09-07T06:13:53.3711884Z * [new tag] trunk/8875d6e394da2fffd04f31b28bf258c94d4776a3 -> trunk/8875d6e394da2fffd04f31b28bf258c94d4776a3 2025-09-07T06:13:53.3712730Z * [new tag] trunk/88d94d17e8c5155451393afa6eb3bab48ab61c16 -> trunk/88d94d17e8c5155451393afa6eb3bab48ab61c16 2025-09-07T06:13:53.3713728Z * [new tag] trunk/890626632def7e0ef95a2d01e87a0e4627824a9f -> trunk/890626632def7e0ef95a2d01e87a0e4627824a9f 2025-09-07T06:13:53.3714897Z * [new tag] trunk/8975cda2520b7b1b5bc3b4d8213edf261fa82570 -> trunk/8975cda2520b7b1b5bc3b4d8213edf261fa82570 2025-09-07T06:13:53.3715729Z * [new tag] trunk/89d41d3f61d04f14730ec26f008a59bef6624610 -> trunk/89d41d3f61d04f14730ec26f008a59bef6624610 2025-09-07T06:13:53.3716638Z * [new tag] trunk/8bb213b6d599ef1273fe52f9b1f6d476056c3a41 -> trunk/8bb213b6d599ef1273fe52f9b1f6d476056c3a41 2025-09-07T06:13:53.3717601Z * [new tag] trunk/8e23a1227b5fb2e39afaa7d57c075a75b640a5af -> trunk/8e23a1227b5fb2e39afaa7d57c075a75b640a5af 2025-09-07T06:13:53.3719039Z * [new tag] trunk/8ec551bb354ab2b85fbbba9d461740a20366d248 -> trunk/8ec551bb354ab2b85fbbba9d461740a20366d248 2025-09-07T06:13:53.3720026Z * [new tag] trunk/8fd3c9ce919c8d5c645fd348bba517e948cbc29d -> trunk/8fd3c9ce919c8d5c645fd348bba517e948cbc29d 2025-09-07T06:13:53.3721630Z * [new tag] trunk/90f50f7e68e120d9574e6e3189e37b4280010ad9 -> trunk/90f50f7e68e120d9574e6e3189e37b4280010ad9 2025-09-07T06:13:53.3723064Z * [new tag] trunk/91f0bcf43fc0bc743350d491ac63b77e92054ac9 -> trunk/91f0bcf43fc0bc743350d491ac63b77e92054ac9 2025-09-07T06:13:53.3723914Z * [new tag] trunk/92576a594b8121f6b0b1b5a3ea16d08792fc68ab -> trunk/92576a594b8121f6b0b1b5a3ea16d08792fc68ab 2025-09-07T06:13:53.3724850Z * [new tag] trunk/92a43025e0baa1f2ce345f28d22913b518a1ab9d -> trunk/92a43025e0baa1f2ce345f28d22913b518a1ab9d 2025-09-07T06:13:53.3725625Z * [new tag] trunk/93fb23d6fae7c4e82c4239a1033e522088742634 -> trunk/93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:13:53.3726549Z * [new tag] trunk/9458d1ac3bd70c2af316a8ba95d2c6c9c1199c9c -> trunk/9458d1ac3bd70c2af316a8ba95d2c6c9c1199c9c 2025-09-07T06:13:53.3727487Z * [new tag] trunk/9480cdc0b61488c89a23c2f64f43b2dcedc8728e -> trunk/9480cdc0b61488c89a23c2f64f43b2dcedc8728e 2025-09-07T06:13:53.3728563Z * [new tag] trunk/9491d289b329e4ba4a9f5f5b1be7960671bb7840 -> trunk/9491d289b329e4ba4a9f5f5b1be7960671bb7840 2025-09-07T06:13:53.3729468Z * [new tag] trunk/9499c8761cd2067feb9877414e818f6fd00290f1 -> trunk/9499c8761cd2067feb9877414e818f6fd00290f1 2025-09-07T06:13:53.3730525Z * [new tag] trunk/95ee0bfea99d3d346d6502b91b497d2b35795504 -> trunk/95ee0bfea99d3d346d6502b91b497d2b35795504 2025-09-07T06:13:53.3731543Z * [new tag] trunk/98374612fc2febd686be20761e56bdc2424bc36a -> trunk/98374612fc2febd686be20761e56bdc2424bc36a 2025-09-07T06:13:53.3733020Z * [new tag] trunk/98efc9e93d8fc61eb53cb91378443617cb550500 -> trunk/98efc9e93d8fc61eb53cb91378443617cb550500 2025-09-07T06:13:53.3733933Z * [new tag] trunk/994f2a5dbcbdc915da39bf6f6ce4d1f5e74835c9 -> trunk/994f2a5dbcbdc915da39bf6f6ce4d1f5e74835c9 2025-09-07T06:13:53.3734927Z * [new tag] trunk/99f356fa58c8d726cef022d8710f5491291158f6 -> trunk/99f356fa58c8d726cef022d8710f5491291158f6 2025-09-07T06:13:53.3735908Z * [new tag] trunk/9a1c5c0a078b94d13ac5c1ae0d754d19fb73bf99 -> trunk/9a1c5c0a078b94d13ac5c1ae0d754d19fb73bf99 2025-09-07T06:13:53.3736907Z * [new tag] trunk/9a665ca3c472384e9d722bddba79e5a7680f1abd -> trunk/9a665ca3c472384e9d722bddba79e5a7680f1abd 2025-09-07T06:13:53.3737873Z * [new tag] trunk/9aedb3cd87b52160872173c177f61053d97bed57 -> trunk/9aedb3cd87b52160872173c177f61053d97bed57 2025-09-07T06:13:53.3738871Z * [new tag] trunk/9b81fe281da41f2421506339d26b027a468902f4 -> trunk/9b81fe281da41f2421506339d26b027a468902f4 2025-09-07T06:13:53.3739869Z * [new tag] trunk/9bdcee01f86e2969cff1140cdecfca13cb51816e -> trunk/9bdcee01f86e2969cff1140cdecfca13cb51816e 2025-09-07T06:13:53.3740818Z * [new tag] trunk/9c03d6be87eedc06e524e202e07a7e776551a839 -> trunk/9c03d6be87eedc06e524e202e07a7e776551a839 2025-09-07T06:13:53.3741816Z * [new tag] trunk/9c957723a0fedd9c637e63e023a613019e2cab60 -> trunk/9c957723a0fedd9c637e63e023a613019e2cab60 2025-09-07T06:13:53.3742808Z * [new tag] trunk/9e5247f51d81735e5f1e65e80588985fa93bccc5 -> trunk/9e5247f51d81735e5f1e65e80588985fa93bccc5 2025-09-07T06:13:53.3743934Z * [new tag] trunk/9eadb37cdd699f7e8e8177a5227bfeb16184ef26 -> trunk/9eadb37cdd699f7e8e8177a5227bfeb16184ef26 2025-09-07T06:13:53.3744924Z * [new tag] trunk/a00cdc1e4159db73c9ffb3f25e93e55877709a29 -> trunk/a00cdc1e4159db73c9ffb3f25e93e55877709a29 2025-09-07T06:13:53.3745883Z * [new tag] trunk/a02ee4a816d11380c6f564c1aba64d56af5ba705 -> trunk/a02ee4a816d11380c6f564c1aba64d56af5ba705 2025-09-07T06:13:53.3746775Z * [new tag] trunk/a3c7f77e50f900721817934120d60c2361b3c40d -> trunk/a3c7f77e50f900721817934120d60c2361b3c40d 2025-09-07T06:13:53.3747724Z * [new tag] trunk/a3d72b09ae12126a2b7d4a63a45ac100a882a802 -> trunk/a3d72b09ae12126a2b7d4a63a45ac100a882a802 2025-09-07T06:13:53.3748675Z * [new tag] trunk/a3e5466002791da609fcb069155d8ee347baee92 -> trunk/a3e5466002791da609fcb069155d8ee347baee92 2025-09-07T06:13:53.3753610Z * [new tag] trunk/a714437093ed196eee28f7de454cf4c41badc098 -> trunk/a714437093ed196eee28f7de454cf4c41badc098 2025-09-07T06:13:53.3754566Z * [new tag] trunk/a75e8cd27098f290de0b7439685d05ce02e91356 -> trunk/a75e8cd27098f290de0b7439685d05ce02e91356 2025-09-07T06:13:53.3755376Z * [new tag] trunk/a8d6943d36c1c2a5f90d3573460695bad4b623ae -> trunk/a8d6943d36c1c2a5f90d3573460695bad4b623ae 2025-09-07T06:13:53.3756378Z * [new tag] trunk/a918bbad6ab20649ff82eefb48417ecbe96bcb34 -> trunk/a918bbad6ab20649ff82eefb48417ecbe96bcb34 2025-09-07T06:13:53.3757382Z * [new tag] trunk/a99d8d39bc842d6ebc3e368b178e4884d24b056e -> trunk/a99d8d39bc842d6ebc3e368b178e4884d24b056e 2025-09-07T06:13:53.3758355Z * [new tag] trunk/aac1a50a191b4102d566c9c1ea22f06d6c2e3f02 -> trunk/aac1a50a191b4102d566c9c1ea22f06d6c2e3f02 2025-09-07T06:13:53.3759349Z * [new tag] trunk/aad96a202244c7d0d120c04ba8db593edd8c0f92 -> trunk/aad96a202244c7d0d120c04ba8db593edd8c0f92 2025-09-07T06:13:53.3760301Z * [new tag] trunk/ab643e4dbbaf7b663d4237514cbf01af9b11565c -> trunk/ab643e4dbbaf7b663d4237514cbf01af9b11565c 2025-09-07T06:13:53.3761443Z * [new tag] trunk/abc447174cd2cf8591edbc70a9f836f9a5779f47 -> trunk/abc447174cd2cf8591edbc70a9f836f9a5779f47 2025-09-07T06:13:53.3762425Z * [new tag] trunk/acece97c3a9dceb63194e314da93fdf37cf15a0d -> trunk/acece97c3a9dceb63194e314da93fdf37cf15a0d 2025-09-07T06:13:53.3763438Z * [new tag] trunk/adae7f66aacf3f248c3101b858cf98d5809119fa -> trunk/adae7f66aacf3f248c3101b858cf98d5809119fa 2025-09-07T06:13:53.3764463Z * [new tag] trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c -> trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c 2025-09-07T06:13:53.3765381Z * [new tag] trunk/aed33a8fcbd60b052d4559d261390c5797129c6d -> trunk/aed33a8fcbd60b052d4559d261390c5797129c6d 2025-09-07T06:13:53.3766578Z * [new tag] trunk/b04e922712080a3652e438d05e8bb74e0cd2d238 -> trunk/b04e922712080a3652e438d05e8bb74e0cd2d238 2025-09-07T06:13:53.3767532Z * [new tag] trunk/b0a3e58dd71c1a039ac0ef51e5bd8f704f632f6f -> trunk/b0a3e58dd71c1a039ac0ef51e5bd8f704f632f6f 2025-09-07T06:13:53.3768501Z * [new tag] trunk/b16d3f4c8c01d461c2f01064e9ca5fa2b33f5cf1 -> trunk/b16d3f4c8c01d461c2f01064e9ca5fa2b33f5cf1 2025-09-07T06:13:53.3769415Z * [new tag] trunk/b18bb6796f210a183e687d9d64984a5a9d13cf09 -> trunk/b18bb6796f210a183e687d9d64984a5a9d13cf09 2025-09-07T06:13:53.3770396Z * [new tag] trunk/b1bb98ddebdd3e41bf7987372409bdce96ae55de -> trunk/b1bb98ddebdd3e41bf7987372409bdce96ae55de 2025-09-07T06:13:53.3771393Z * [new tag] trunk/b2b4add0e754411372060e1d7b4057a66439172b -> trunk/b2b4add0e754411372060e1d7b4057a66439172b 2025-09-07T06:13:53.3772687Z * [new tag] trunk/b2c7b9ad2dc5a7c0b61febd307761bd5bc2f0f05 -> trunk/b2c7b9ad2dc5a7c0b61febd307761bd5bc2f0f05 2025-09-07T06:13:53.3773644Z * [new tag] trunk/b40d9432be44a6b5974ee62e7d19c3c61c5ece37 -> trunk/b40d9432be44a6b5974ee62e7d19c3c61c5ece37 2025-09-07T06:13:53.3774615Z * [new tag] trunk/b4ad38279b178b7bd14355123c1101e2e853e77b -> trunk/b4ad38279b178b7bd14355123c1101e2e853e77b 2025-09-07T06:13:53.3775619Z * [new tag] trunk/b67c41039835bd9b20b83cd6233e86baaa5f5dde -> trunk/b67c41039835bd9b20b83cd6233e86baaa5f5dde 2025-09-07T06:13:53.3776875Z * [new tag] trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c -> trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c 2025-09-07T06:13:53.3777766Z * [new tag] trunk/b7dad7dd49448c88d0751fa2e29c70afe985f734 -> trunk/b7dad7dd49448c88d0751fa2e29c70afe985f734 2025-09-07T06:13:53.3778767Z * [new tag] trunk/b7e207ca9f046ddd716076965a0cce403ba99052 -> trunk/b7e207ca9f046ddd716076965a0cce403ba99052 2025-09-07T06:13:53.3779905Z * [new tag] trunk/b919560c4a7010e2d89facee25586269a994746e -> trunk/b919560c4a7010e2d89facee25586269a994746e 2025-09-07T06:13:53.3780902Z * [new tag] trunk/b9ba612f7a968f7b27e121ca8f4d0a4d954f5354 -> trunk/b9ba612f7a968f7b27e121ca8f4d0a4d954f5354 2025-09-07T06:13:53.3781977Z * [new tag] trunk/ba7f546ccccb5e0b36d9070dc25f26a9647f89f8 -> trunk/ba7f546ccccb5e0b36d9070dc25f26a9647f89f8 2025-09-07T06:13:53.3782952Z * [new tag] trunk/bb950284c7e72905994bc25dd436c10e48088d85 -> trunk/bb950284c7e72905994bc25dd436c10e48088d85 2025-09-07T06:13:53.3784107Z * [new tag] trunk/bbedc71fd3267c639c38b4ec25eaa22f973d9c4d -> trunk/bbedc71fd3267c639c38b4ec25eaa22f973d9c4d 2025-09-07T06:13:53.3784921Z * [new tag] trunk/bc4db2c27fce6ff1648bdc5af31ec225d2a31f37 -> trunk/bc4db2c27fce6ff1648bdc5af31ec225d2a31f37 2025-09-07T06:13:53.3785839Z * [new tag] trunk/bc505977fb66677a09c31155c987330fbb18a865 -> trunk/bc505977fb66677a09c31155c987330fbb18a865 2025-09-07T06:13:53.3786836Z * [new tag] trunk/bd39e47feea7326afb5bbb67fcb1e69279239527 -> trunk/bd39e47feea7326afb5bbb67fcb1e69279239527 2025-09-07T06:13:53.3787898Z * [new tag] trunk/be5b03dde96638f25ffd732a4fed7e41b4cf40e1 -> trunk/be5b03dde96638f25ffd732a4fed7e41b4cf40e1 2025-09-07T06:13:53.3788864Z * [new tag] trunk/bffc7dd1f374d8408911cd22c6b3d6df39ded9b3 -> trunk/bffc7dd1f374d8408911cd22c6b3d6df39ded9b3 2025-09-07T06:13:53.3790316Z * [new tag] trunk/c024b1f5a18d5c5aee5cc2acdd4c52b24b93ffcf -> trunk/c024b1f5a18d5c5aee5cc2acdd4c52b24b93ffcf 2025-09-07T06:13:53.3791193Z * [new tag] trunk/c0983e6cc0acf71689e1851d12609e00b3f59371 -> trunk/c0983e6cc0acf71689e1851d12609e00b3f59371 2025-09-07T06:13:53.3792224Z * [new tag] trunk/c10195e723eeeedd099ed8b73eda7184ca618fad -> trunk/c10195e723eeeedd099ed8b73eda7184ca618fad 2025-09-07T06:13:53.3793165Z * [new tag] trunk/c157cf6488ade6a7ee2ce2d25b059e1335630a99 -> trunk/c157cf6488ade6a7ee2ce2d25b059e1335630a99 2025-09-07T06:13:53.3794111Z * [new tag] trunk/c2a30246172fd71d56529907ffd3c27b76b1f3a7 -> trunk/c2a30246172fd71d56529907ffd3c27b76b1f3a7 2025-09-07T06:13:53.3795120Z * [new tag] trunk/c32111149921b48bfef909293f1049e21619ed76 -> trunk/c32111149921b48bfef909293f1049e21619ed76 2025-09-07T06:13:53.3795926Z * [new tag] trunk/c37103234afc832dcad307e9016230810957c9d5 -> trunk/c37103234afc832dcad307e9016230810957c9d5 2025-09-07T06:13:53.3796893Z * [new tag] trunk/c3ceca2995cd35e1376c4b0704669bff1a81e836 -> trunk/c3ceca2995cd35e1376c4b0704669bff1a81e836 2025-09-07T06:13:53.3797881Z * [new tag] trunk/c3d54dea9febb1236d48d19e5d4876a63f2e20fd -> trunk/c3d54dea9febb1236d48d19e5d4876a63f2e20fd 2025-09-07T06:13:53.3798939Z * [new tag] trunk/c465b3d52c5687fe910d35a5c75341b77f821741 -> trunk/c465b3d52c5687fe910d35a5c75341b77f821741 2025-09-07T06:13:53.3799923Z * [new tag] trunk/c5b8a10be5e89396da916d1069ffcb7135f0372b -> trunk/c5b8a10be5e89396da916d1069ffcb7135f0372b 2025-09-07T06:13:53.3800755Z * [new tag] trunk/c7e41071a08f4045bc11ab60ec366d7357d56e30 -> trunk/c7e41071a08f4045bc11ab60ec366d7357d56e30 2025-09-07T06:13:53.3801750Z * [new tag] trunk/c98ddaca6d2e19ca37aff00c4ff0cda1e9a6ff65 -> trunk/c98ddaca6d2e19ca37aff00c4ff0cda1e9a6ff65 2025-09-07T06:13:53.3802677Z * [new tag] trunk/cb1e31362c7b53acf4ac95b9f8878064c184f03b -> trunk/cb1e31362c7b53acf4ac95b9f8878064c184f03b 2025-09-07T06:13:53.3803663Z * [new tag] trunk/cbfb005f7cce79974795b148e265f594f59477c8 -> trunk/cbfb005f7cce79974795b148e265f594f59477c8 2025-09-07T06:13:53.3804696Z * [new tag] trunk/cc5bdd12401bda835291d2f3cb297132ebdbf358 -> trunk/cc5bdd12401bda835291d2f3cb297132ebdbf358 2025-09-07T06:13:53.3805862Z * [new tag] trunk/cd529b686d54bbaa443f5b310140de48422d96c7 -> trunk/cd529b686d54bbaa443f5b310140de48422d96c7 2025-09-07T06:13:53.3806798Z * [new tag] trunk/cec0ff122815582af5302360aff03676558c5c87 -> trunk/cec0ff122815582af5302360aff03676558c5c87 2025-09-07T06:13:53.3807745Z * [new tag] trunk/d11720efdb563d02cf4f7d324311fb15a755268e -> trunk/d11720efdb563d02cf4f7d324311fb15a755268e 2025-09-07T06:13:53.3808669Z * [new tag] trunk/d1706d9128ae24d9048167e80d3fe5196d19035e -> trunk/d1706d9128ae24d9048167e80d3fe5196d19035e 2025-09-07T06:13:53.3809690Z * [new tag] trunk/d1a15abfdcaef138f2d9e93a9f46be44f30b766d -> trunk/d1a15abfdcaef138f2d9e93a9f46be44f30b766d 2025-09-07T06:13:53.3810891Z * [new tag] trunk/d232a95d4a79404ca05c1f52d37fde7339dcdf49 -> trunk/d232a95d4a79404ca05c1f52d37fde7339dcdf49 2025-09-07T06:13:53.3812119Z * [new tag] trunk/d2d4c8e9b2371c9aacfb771d9402ac7427b9778e -> trunk/d2d4c8e9b2371c9aacfb771d9402ac7427b9778e 2025-09-07T06:13:53.3813155Z * [new tag] trunk/d33840c542b387ab08ba49aa6c45aa9567fd9be7 -> trunk/d33840c542b387ab08ba49aa6c45aa9567fd9be7 2025-09-07T06:13:53.3814142Z * [new tag] trunk/d5643e8f3a648a99636bfa1f2a41d54bd3c0d0f1 -> trunk/d5643e8f3a648a99636bfa1f2a41d54bd3c0d0f1 2025-09-07T06:13:53.3815109Z * [new tag] trunk/d5b38410b5b6cf75c7a7389972777a6497926ee7 -> trunk/d5b38410b5b6cf75c7a7389972777a6497926ee7 2025-09-07T06:13:53.3815948Z * [new tag] trunk/d5e0f4202ba14632e4d14862ace096609e763462 -> trunk/d5e0f4202ba14632e4d14862ace096609e763462 2025-09-07T06:13:53.3817011Z * [new tag] trunk/d636c181f9140a7b59be10b36eae23039fc2bb72 -> trunk/d636c181f9140a7b59be10b36eae23039fc2bb72 2025-09-07T06:13:53.3818707Z * [new tag] trunk/d64718503728001a1e78168fd7f2d4ff23e57285 -> trunk/d64718503728001a1e78168fd7f2d4ff23e57285 2025-09-07T06:13:53.3819709Z * [new tag] trunk/d67c29ad22670320d676b02e394274af34e8e643 -> trunk/d67c29ad22670320d676b02e394274af34e8e643 2025-09-07T06:13:53.3820721Z * [new tag] trunk/d6b74568e2c98ce58ecc145b72ac66d4caf7ce95 -> trunk/d6b74568e2c98ce58ecc145b72ac66d4caf7ce95 2025-09-07T06:13:53.3821734Z * [new tag] trunk/d711f27845abd45007ccab6076649ebd896c2661 -> trunk/d711f27845abd45007ccab6076649ebd896c2661 2025-09-07T06:13:53.3822717Z * [new tag] trunk/d9d6dde0f42d4bcc8c97671ac50d5096c7e500ab -> trunk/d9d6dde0f42d4bcc8c97671ac50d5096c7e500ab 2025-09-07T06:13:53.3823760Z * [new tag] trunk/da4db4b33d1fdd046650cf19fdbac581a19bf2f9 -> trunk/da4db4b33d1fdd046650cf19fdbac581a19bf2f9 2025-09-07T06:13:53.3824709Z * [new tag] trunk/dac8a4b91c01c3bbc96f54e621b1ea4ffdbd29d1 -> trunk/dac8a4b91c01c3bbc96f54e621b1ea4ffdbd29d1 2025-09-07T06:13:53.3825773Z * [new tag] trunk/dbec08729fb9848bebed6048c63831b87170d061 -> trunk/dbec08729fb9848bebed6048c63831b87170d061 2025-09-07T06:13:53.3826615Z * [new tag] trunk/dcf385395d838f38c8dca25913578230dd43099a -> trunk/dcf385395d838f38c8dca25913578230dd43099a 2025-09-07T06:13:53.3827689Z * [new tag] trunk/dd2519abe83ec3c40d4797492434e41fe3b47e17 -> trunk/dd2519abe83ec3c40d4797492434e41fe3b47e17 2025-09-07T06:13:53.3828674Z * [new tag] trunk/dec72ea4b006dd0fbcaaaa106ad273d73807ab9d -> trunk/dec72ea4b006dd0fbcaaaa106ad273d73807ab9d 2025-09-07T06:13:53.3829650Z * [new tag] trunk/e0a62b266c021b910ce6dc02a6c9429210487717 -> trunk/e0a62b266c021b910ce6dc02a6c9429210487717 2025-09-07T06:13:53.3830663Z * [new tag] trunk/e19e02c84c9dcc408375e5cae3b0709c18b99228 -> trunk/e19e02c84c9dcc408375e5cae3b0709c18b99228 2025-09-07T06:13:53.3831888Z * [new tag] trunk/e304ea4e69d3a7deeb7e48c7450c214a4c953937 -> trunk/e304ea4e69d3a7deeb7e48c7450c214a4c953937 2025-09-07T06:13:53.3832883Z * [new tag] trunk/e3068cdb446adefb5a875616ba37a60235391439 -> trunk/e3068cdb446adefb5a875616ba37a60235391439 2025-09-07T06:13:53.3833906Z * [new tag] trunk/e381d4b0205d5f126c1de534f867ba776f7c3ee6 -> trunk/e381d4b0205d5f126c1de534f867ba776f7c3ee6 2025-09-07T06:13:53.3834896Z * [new tag] trunk/e4bd0ff4f8981b805df32ea5b3550621965ea4f2 -> trunk/e4bd0ff4f8981b805df32ea5b3550621965ea4f2 2025-09-07T06:13:53.3835719Z * [new tag] trunk/e532c9d4f1cdcbc1ea9628f55b9813e77847bdc7 -> trunk/e532c9d4f1cdcbc1ea9628f55b9813e77847bdc7 2025-09-07T06:13:53.3836635Z * [new tag] trunk/e92cd9415377403b6e90585e764639e2e0b5973b -> trunk/e92cd9415377403b6e90585e764639e2e0b5973b 2025-09-07T06:13:53.3837621Z * [new tag] trunk/e9481b6617b5576b099d8ca5798111592e9ad090 -> trunk/e9481b6617b5576b099d8ca5798111592e9ad090 2025-09-07T06:13:53.3838449Z * [new tag] trunk/ea1883dfd3e42defe37b11202b878bb76defa087 -> trunk/ea1883dfd3e42defe37b11202b878bb76defa087 2025-09-07T06:13:53.3839446Z * [new tag] trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 -> trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 2025-09-07T06:13:53.3840368Z * [new tag] trunk/eb18d32bda75189494d955aa001ade15f10333de -> trunk/eb18d32bda75189494d955aa001ade15f10333de 2025-09-07T06:13:53.3841186Z * [new tag] trunk/ef3be6726f7ff4b77c22db10cec5b686f9107ea9 -> trunk/ef3be6726f7ff4b77c22db10cec5b686f9107ea9 2025-09-07T06:13:53.3842154Z * [new tag] trunk/ef8aabd42422725026cb4dbf48aafa9efa226a04 -> trunk/ef8aabd42422725026cb4dbf48aafa9efa226a04 2025-09-07T06:13:53.3843362Z * [new tag] trunk/f00445b43eee57e20bb9316fa796ca23bf73373b -> trunk/f00445b43eee57e20bb9316fa796ca23bf73373b 2025-09-07T06:13:53.3844246Z * [new tag] trunk/f0c391102b754e3b145e8c59231d2df563487e37 -> trunk/f0c391102b754e3b145e8c59231d2df563487e37 2025-09-07T06:13:53.3845320Z * [new tag] trunk/f27985b7e796fb66a1b476284ba42d8cb360a751 -> trunk/f27985b7e796fb66a1b476284ba42d8cb360a751 2025-09-07T06:13:53.3846363Z * [new tag] trunk/f36f285953700f971552083a5da9d0ceacb63bbd -> trunk/f36f285953700f971552083a5da9d0ceacb63bbd 2025-09-07T06:13:53.3847335Z * [new tag] trunk/f3cebec39ebc110e1c8b06e741896585f7892dbb -> trunk/f3cebec39ebc110e1c8b06e741896585f7892dbb 2025-09-07T06:13:53.3848165Z * [new tag] trunk/f4c33cd44acac92c0b451a04da20ebe9370e5b0c -> trunk/f4c33cd44acac92c0b451a04da20ebe9370e5b0c 2025-09-07T06:13:53.3849491Z * [new tag] trunk/f612045ce105f008b2b675e2fc870163babeb2e8 -> trunk/f612045ce105f008b2b675e2fc870163babeb2e8 2025-09-07T06:13:53.3850701Z * [new tag] trunk/f8746b878dfc1e9639d42cbde832e9b9e792c86c -> trunk/f8746b878dfc1e9639d42cbde832e9b9e792c86c 2025-09-07T06:13:53.3851728Z * [new tag] trunk/f8ffa9194e26523e5f976d4a824d5cc58922727c -> trunk/f8ffa9194e26523e5f976d4a824d5cc58922727c 2025-09-07T06:13:53.3852744Z * [new tag] trunk/f981a7fa5230b98974291fdde32fe8488bc5d469 -> trunk/f981a7fa5230b98974291fdde32fe8488bc5d469 2025-09-07T06:13:53.3853756Z * [new tag] trunk/fbf3d2027daabbcb44d0af274b139be2a248a4f7 -> trunk/fbf3d2027daabbcb44d0af274b139be2a248a4f7 2025-09-07T06:13:53.3855421Z * [new tag] trunk/fca2601c9d628e1bd2d75c7318cd22c4e8c832aa -> trunk/fca2601c9d628e1bd2d75c7318cd22c4e8c832aa 2025-09-07T06:13:53.3856398Z * [new tag] trunk/fea20775ad96bdca972a1811d7d3372f368614ab -> trunk/fea20775ad96bdca972a1811d7d3372f368614ab 2025-09-07T06:13:53.3857237Z * [new tag] trunk/fefee081642f87419a21dc852f7167d4640443cd -> trunk/fefee081642f87419a21dc852f7167d4640443cd 2025-09-07T06:13:53.3857900Z * [new tag] v0.1.1 -> v0.1.1 2025-09-07T06:13:53.3858891Z * [new tag] v0.1.10 -> v0.1.10 2025-09-07T06:13:53.3859824Z * [new tag] v0.1.11 -> v0.1.11 2025-09-07T06:13:53.3860786Z * [new tag] v0.1.12 -> v0.1.12 2025-09-07T06:13:53.3861700Z * [new tag] v0.1.2 -> v0.1.2 2025-09-07T06:13:53.3862512Z * [new tag] v0.1.3 -> v0.1.3 2025-09-07T06:13:53.3863600Z * [new tag] v0.1.4 -> v0.1.4 2025-09-07T06:13:53.3864479Z * [new tag] v0.1.5 -> v0.1.5 2025-09-07T06:13:53.3865398Z * [new tag] v0.1.6 -> v0.1.6 2025-09-07T06:13:53.3866143Z * [new tag] v0.1.7 -> v0.1.7 2025-09-07T06:13:53.3866997Z * [new tag] v0.1.8 -> v0.1.8 2025-09-07T06:13:53.3867784Z * [new tag] v0.1.9 -> v0.1.9 2025-09-07T06:13:53.3868734Z * [new tag] v0.2.0 -> v0.2.0 2025-09-07T06:13:53.3869649Z * [new tag] v0.3.0 -> v0.3.0 2025-09-07T06:13:53.3870623Z * [new tag] v0.3.1 -> v0.3.1 2025-09-07T06:13:53.3871499Z * [new tag] v0.4.0 -> v0.4.0 2025-09-07T06:13:53.3872359Z * [new tag] v0.4.1 -> v0.4.1 2025-09-07T06:13:53.3873174Z * [new tag] v1.0.0 -> v1.0.0 2025-09-07T06:13:53.3874182Z * [new tag] v1.0.0a0 -> v1.0.0a0 2025-09-07T06:13:53.3875082Z * [new tag] v1.0.1 -> v1.0.1 2025-09-07T06:13:53.3875972Z * [new tag] v1.0rc0 -> v1.0rc0 2025-09-07T06:13:53.3876720Z * [new tag] v1.0rc1 -> v1.0rc1 2025-09-07T06:13:53.3877622Z * [new tag] v1.1.0 -> v1.1.0 2025-09-07T06:13:53.3878528Z * [new tag] v1.1.0a0 -> v1.1.0a0 2025-09-07T06:13:53.3880158Z * [new tag] v1.10.0 -> v1.10.0 2025-09-07T06:13:53.3881145Z * [new tag] v1.10.0-rc1 -> v1.10.0-rc1 2025-09-07T06:13:53.3882081Z * [new tag] v1.10.0-rc2 -> v1.10.0-rc2 2025-09-07T06:13:53.3882757Z * [new tag] v1.10.0-rc3 -> v1.10.0-rc3 2025-09-07T06:13:53.3883713Z * [new tag] v1.10.1 -> v1.10.1 2025-09-07T06:13:53.3884403Z * [new tag] v1.10.1-rc1 -> v1.10.1-rc1 2025-09-07T06:13:53.3885092Z * [new tag] v1.10.2 -> v1.10.2 2025-09-07T06:13:53.3885788Z * [new tag] v1.10.2-rc1 -> v1.10.2-rc1 2025-09-07T06:13:53.3886790Z * [new tag] v1.11.0 -> v1.11.0 2025-09-07T06:13:53.3887810Z * [new tag] v1.11.0-rc1 -> v1.11.0-rc1 2025-09-07T06:13:53.3888813Z * [new tag] v1.11.0-rc2 -> v1.11.0-rc2 2025-09-07T06:13:53.3889803Z * [new tag] v1.11.0-rc3 -> v1.11.0-rc3 2025-09-07T06:13:53.3890733Z * [new tag] v1.11.0-rc4 -> v1.11.0-rc4 2025-09-07T06:13:53.3891904Z * [new tag] v1.11.0-rc5 -> v1.11.0-rc5 2025-09-07T06:13:53.3892766Z * [new tag] v1.11.0-rc6 -> v1.11.0-rc6 2025-09-07T06:13:53.3893495Z * [new tag] v1.11.0-rc7 -> v1.11.0-rc7 2025-09-07T06:13:53.3894569Z * [new tag] v1.12.0 -> v1.12.0 2025-09-07T06:13:53.3895540Z * [new tag] v1.12.0-rc1 -> v1.12.0-rc1 2025-09-07T06:13:53.3896478Z * [new tag] v1.12.0-rc2 -> v1.12.0-rc2 2025-09-07T06:13:53.3897424Z * [new tag] v1.12.0-rc3 -> v1.12.0-rc3 2025-09-07T06:13:53.3898466Z * [new tag] v1.12.0-rc4 -> v1.12.0-rc4 2025-09-07T06:13:53.3899429Z * [new tag] v1.12.0-rc5 -> v1.12.0-rc5 2025-09-07T06:13:53.3900543Z * [new tag] v1.12.0-rc6 -> v1.12.0-rc6 2025-09-07T06:13:53.3901083Z * [new tag] v1.12.0-rc7 -> v1.12.0-rc7 2025-09-07T06:13:53.3901835Z * [new tag] v1.12.0-rc8 -> v1.12.0-rc8 2025-09-07T06:13:53.3902545Z * [new tag] v1.12.1 -> v1.12.1 2025-09-07T06:13:53.3903779Z * [new tag] v1.12.1-rc1 -> v1.12.1-rc1 2025-09-07T06:13:53.3904685Z * [new tag] v1.12.1-rc2 -> v1.12.1-rc2 2025-09-07T06:13:53.3905662Z * [new tag] v1.12.1-rc3 -> v1.12.1-rc3 2025-09-07T06:13:53.3906755Z * [new tag] v1.12.1-rc4 -> v1.12.1-rc4 2025-09-07T06:13:53.3907371Z * [new tag] v1.12.1-rc5 -> v1.12.1-rc5 2025-09-07T06:13:53.3908465Z * [new tag] v1.13.0 -> v1.13.0 2025-09-07T06:13:53.3909303Z * [new tag] v1.13.0-rc1 -> v1.13.0-rc1 2025-09-07T06:13:53.3910216Z * [new tag] v1.13.0-rc2 -> v1.13.0-rc2 2025-09-07T06:13:53.3911142Z * [new tag] v1.13.0-rc3 -> v1.13.0-rc3 2025-09-07T06:13:53.3912195Z * [new tag] v1.13.0-rc4 -> v1.13.0-rc4 2025-09-07T06:13:53.3912857Z * [new tag] v1.13.0-rc5 -> v1.13.0-rc5 2025-09-07T06:13:53.3913500Z * [new tag] v1.13.0-rc6 -> v1.13.0-rc6 2025-09-07T06:13:53.3914473Z * [new tag] v1.13.1 -> v1.13.1 2025-09-07T06:13:53.3915120Z * [new tag] v1.13.1-rc1 -> v1.13.1-rc1 2025-09-07T06:13:53.3916054Z * [new tag] v1.2.0 -> v1.2.0 2025-09-07T06:13:53.3916993Z * [new tag] v1.2.0a0 -> v1.2.0a0 2025-09-07T06:13:53.3917883Z * [new tag] v1.3.0 -> v1.3.0 2025-09-07T06:13:53.3918853Z * [new tag] v1.3.0a0 -> v1.3.0a0 2025-09-07T06:13:53.3919544Z * [new tag] v1.3.1 -> v1.3.1 2025-09-07T06:13:53.3920431Z * [new tag] v1.4.0 -> v1.4.0 2025-09-07T06:13:53.3921534Z * [new tag] v1.4.0a0 -> v1.4.0a0 2025-09-07T06:13:53.3922205Z * [new tag] v1.4.1 -> v1.4.1 2025-09-07T06:13:53.3923223Z * [new tag] v1.5.0 -> v1.5.0 2025-09-07T06:13:53.3924249Z * [new tag] v1.5.0-rc1 -> v1.5.0-rc1 2025-09-07T06:13:53.3925194Z * [new tag] v1.5.0-rc2 -> v1.5.0-rc2 2025-09-07T06:13:53.3926207Z * [new tag] v1.5.0-rc3 -> v1.5.0-rc3 2025-09-07T06:13:53.3927023Z * [new tag] v1.5.0-rc4 -> v1.5.0-rc4 2025-09-07T06:13:53.3927708Z * [new tag] v1.5.0-rc5 -> v1.5.0-rc5 2025-09-07T06:13:53.3928746Z * [new tag] v1.5.1 -> v1.5.1 2025-09-07T06:13:53.3929472Z * [new tag] v1.5.1-rc1 -> v1.5.1-rc1 2025-09-07T06:13:53.3930145Z * [new tag] v1.6.0 -> v1.6.0 2025-09-07T06:13:53.3931086Z * [new tag] v1.6.0-rc1 -> v1.6.0-rc1 2025-09-07T06:13:53.3932521Z * [new tag] v1.6.0-rc2 -> v1.6.0-rc2 2025-09-07T06:13:53.3933519Z * [new tag] v1.6.0-rc3 -> v1.6.0-rc3 2025-09-07T06:13:53.3934363Z * [new tag] v1.6.0-rc4 -> v1.6.0-rc4 2025-09-07T06:13:53.3935393Z * [new tag] v1.6.0-rc5 -> v1.6.0-rc5 2025-09-07T06:13:53.3936294Z * [new tag] v1.6.0-rc6 -> v1.6.0-rc6 2025-09-07T06:13:53.3937077Z * [new tag] v1.6.0-rc7 -> v1.6.0-rc7 2025-09-07T06:13:53.3938429Z * [new tag] v1.7.0 -> v1.7.0 2025-09-07T06:13:53.3939395Z * [new tag] v1.7.0-rc1 -> v1.7.0-rc1 2025-09-07T06:13:53.3940453Z * [new tag] v1.7.0-rc2 -> v1.7.0-rc2 2025-09-07T06:13:53.3941442Z * [new tag] v1.7.0-rc3 -> v1.7.0-rc3 2025-09-07T06:13:53.3942081Z * [new tag] v1.7.0-rc4 -> v1.7.0-rc4 2025-09-07T06:13:53.3943131Z * [new tag] v1.7.1 -> v1.7.1 2025-09-07T06:13:53.3944346Z * [new tag] v1.7.1-rc1 -> v1.7.1-rc1 2025-09-07T06:13:53.3945374Z * [new tag] v1.7.1-rc2 -> v1.7.1-rc2 2025-09-07T06:13:53.3946083Z * [new tag] v1.7.1-rc3 -> v1.7.1-rc3 2025-09-07T06:13:53.3947023Z * [new tag] v1.8.0 -> v1.8.0 2025-09-07T06:13:53.3947650Z * [new tag] v1.8.0-rc1 -> v1.8.0-rc1 2025-09-07T06:13:53.3948831Z * [new tag] v1.8.0-rc2 -> v1.8.0-rc2 2025-09-07T06:13:53.3950176Z * [new tag] v1.8.0-rc3 -> v1.8.0-rc3 2025-09-07T06:13:53.3950877Z * [new tag] v1.8.0-rc4 -> v1.8.0-rc4 2025-09-07T06:13:53.3951718Z * [new tag] v1.8.0-rc5 -> v1.8.0-rc5 2025-09-07T06:13:53.3952454Z * [new tag] v1.8.1 -> v1.8.1 2025-09-07T06:13:53.3953501Z * [new tag] v1.8.1-rc1 -> v1.8.1-rc1 2025-09-07T06:13:53.3954214Z * [new tag] v1.8.1-rc2 -> v1.8.1-rc2 2025-09-07T06:13:53.3954898Z * [new tag] v1.8.1-rc3 -> v1.8.1-rc3 2025-09-07T06:13:53.3956541Z * [new tag] v1.8.2 -> v1.8.2 2025-09-07T06:13:53.3957241Z * [new tag] v1.8.2-rc1 -> v1.8.2-rc1 2025-09-07T06:13:53.3958279Z * [new tag] v1.9.0 -> v1.9.0 2025-09-07T06:13:53.3959295Z * [new tag] v1.9.0-rc1 -> v1.9.0-rc1 2025-09-07T06:13:53.3960318Z * [new tag] v1.9.0-rc2 -> v1.9.0-rc2 2025-09-07T06:13:53.3961287Z * [new tag] v1.9.0-rc3 -> v1.9.0-rc3 2025-09-07T06:13:53.3962060Z * [new tag] v1.9.0-rc4 -> v1.9.0-rc4 2025-09-07T06:13:53.3963074Z * [new tag] v1.9.1 -> v1.9.1 2025-09-07T06:13:53.3964198Z * [new tag] v1.9.1-rc1 -> v1.9.1-rc1 2025-09-07T06:13:53.3964893Z * [new tag] v1.9.1-rc2 -> v1.9.1-rc2 2025-09-07T06:13:53.3965885Z * [new tag] v2.0.0 -> v2.0.0 2025-09-07T06:13:53.3966778Z * [new tag] v2.0.0-rc1 -> v2.0.0-rc1 2025-09-07T06:13:53.3967752Z * [new tag] v2.0.0-rc2 -> v2.0.0-rc2 2025-09-07T06:13:53.3968675Z * [new tag] v2.0.0-rc3 -> v2.0.0-rc3 2025-09-07T06:13:53.3969596Z * [new tag] v2.0.0-rc4 -> v2.0.0-rc4 2025-09-07T06:13:53.3970549Z * [new tag] v2.0.0-rc5 -> v2.0.0-rc5 2025-09-07T06:13:53.3971261Z * [new tag] v2.0.0-rc6 -> v2.0.0-rc6 2025-09-07T06:13:53.3972560Z * [new tag] v2.0.1 -> v2.0.1 2025-09-07T06:13:53.3973618Z * [new tag] v2.0.1-rc1 -> v2.0.1-rc1 2025-09-07T06:13:53.3974340Z * [new tag] v2.0.1-rc2 -> v2.0.1-rc2 2025-09-07T06:13:53.3975287Z * [new tag] v2.0.1-rc3 -> v2.0.1-rc3 2025-09-07T06:13:53.3976053Z * [new tag] v2.0.1-rc4 -> v2.0.1-rc4 2025-09-07T06:13:53.3977504Z * [new tag] v2.1.0 -> v2.1.0 2025-09-07T06:13:53.3978423Z * [new tag] v2.1.0-rc1 -> v2.1.0-rc1 2025-09-07T06:13:53.3979429Z * [new tag] v2.1.0-rc2 -> v2.1.0-rc2 2025-09-07T06:13:53.3980444Z * [new tag] v2.1.0-rc3 -> v2.1.0-rc3 2025-09-07T06:13:53.3981441Z * [new tag] v2.1.0-rc4 -> v2.1.0-rc4 2025-09-07T06:13:53.3982541Z * [new tag] v2.1.0-rc5 -> v2.1.0-rc5 2025-09-07T06:13:53.3983284Z * [new tag] v2.1.0-rc6 -> v2.1.0-rc6 2025-09-07T06:13:53.3984345Z * [new tag] v2.1.1 -> v2.1.1 2025-09-07T06:13:53.3985326Z * [new tag] v2.1.1-rc1 -> v2.1.1-rc1 2025-09-07T06:13:53.3986250Z * [new tag] v2.1.1-rc2 -> v2.1.1-rc2 2025-09-07T06:13:53.3987345Z * [new tag] v2.1.1-rc3 -> v2.1.1-rc3 2025-09-07T06:13:53.3988285Z * [new tag] v2.1.1-rc4 -> v2.1.1-rc4 2025-09-07T06:13:53.3989146Z * [new tag] v2.1.1-rc5 -> v2.1.1-rc5 2025-09-07T06:13:53.3989823Z * [new tag] v2.1.1-rc6 -> v2.1.1-rc6 2025-09-07T06:13:53.3990773Z * [new tag] v2.1.2 -> v2.1.2 2025-09-07T06:13:53.3991769Z * [new tag] v2.1.2-rc1 -> v2.1.2-rc1 2025-09-07T06:13:53.3992722Z * [new tag] v2.1.2-rc2 -> v2.1.2-rc2 2025-09-07T06:13:53.3993423Z * [new tag] v2.1.2-rc3 -> v2.1.2-rc3 2025-09-07T06:13:53.3994433Z * [new tag] v2.2.0 -> v2.2.0 2025-09-07T06:13:53.3995358Z * [new tag] v2.2.0-rc1 -> v2.2.0-rc1 2025-09-07T06:13:53.3996263Z * [new tag] v2.2.0-rc2 -> v2.2.0-rc2 2025-09-07T06:13:53.3997072Z * [new tag] v2.2.0-rc3 -> v2.2.0-rc3 2025-09-07T06:13:53.3998004Z * [new tag] v2.2.0-rc4 -> v2.2.0-rc4 2025-09-07T06:13:53.3999356Z * [new tag] v2.2.0-rc5 -> v2.2.0-rc5 2025-09-07T06:13:53.4000318Z * [new tag] v2.2.0-rc6 -> v2.2.0-rc6 2025-09-07T06:13:53.4001025Z * [new tag] v2.2.0-rc7 -> v2.2.0-rc7 2025-09-07T06:13:53.4001787Z * [new tag] v2.2.0-rc8 -> v2.2.0-rc8 2025-09-07T06:13:53.4002797Z * [new tag] v2.2.1 -> v2.2.1 2025-09-07T06:13:53.4003775Z * [new tag] v2.2.1-rc1 -> v2.2.1-rc1 2025-09-07T06:13:53.4004449Z * [new tag] v2.2.1-rc2 -> v2.2.1-rc2 2025-09-07T06:13:53.4005161Z * [new tag] v2.2.1-rc3 -> v2.2.1-rc3 2025-09-07T06:13:53.4005866Z * [new tag] v2.2.2 -> v2.2.2 2025-09-07T06:13:53.4006932Z * [new tag] v2.2.2-rc1 -> v2.2.2-rc1 2025-09-07T06:13:53.4007606Z * [new tag] v2.2.2-rc2 -> v2.2.2-rc2 2025-09-07T06:13:53.4008311Z * [new tag] v2.2.2-rc3 -> v2.2.2-rc3 2025-09-07T06:13:53.4009279Z * [new tag] v2.3.0 -> v2.3.0 2025-09-07T06:13:53.4010196Z * [new tag] v2.3.0-rc1 -> v2.3.0-rc1 2025-09-07T06:13:53.4011272Z * [new tag] v2.3.0-rc10 -> v2.3.0-rc10 2025-09-07T06:13:53.4012636Z * [new tag] v2.3.0-rc11 -> v2.3.0-rc11 2025-09-07T06:13:53.4013247Z * [new tag] v2.3.0-rc12 -> v2.3.0-rc12 2025-09-07T06:13:53.4014316Z * [new tag] v2.3.0-rc2 -> v2.3.0-rc2 2025-09-07T06:13:53.4015355Z * [new tag] v2.3.0-rc3 -> v2.3.0-rc3 2025-09-07T06:13:53.4016346Z * [new tag] v2.3.0-rc4 -> v2.3.0-rc4 2025-09-07T06:13:53.4017298Z * [new tag] v2.3.0-rc5 -> v2.3.0-rc5 2025-09-07T06:13:53.4017955Z * [new tag] v2.3.0-rc6 -> v2.3.0-rc6 2025-09-07T06:13:53.4018978Z * [new tag] v2.3.0-rc7 -> v2.3.0-rc7 2025-09-07T06:13:53.4019953Z * [new tag] v2.3.0-rc8 -> v2.3.0-rc8 2025-09-07T06:13:53.4020583Z * [new tag] v2.3.0-rc9 -> v2.3.0-rc9 2025-09-07T06:13:53.4021334Z * [new tag] v2.3.1 -> v2.3.1 2025-09-07T06:13:53.4022344Z * [new tag] v2.3.1-rc1 -> v2.3.1-rc1 2025-09-07T06:13:53.4023358Z * [new tag] v2.3.1-rc2 -> v2.3.1-rc2 2025-09-07T06:13:53.4024383Z * [new tag] v2.3.1-rc3 -> v2.3.1-rc3 2025-09-07T06:13:53.4025388Z * [new tag] v2.4.0 -> v2.4.0 2025-09-07T06:13:53.4026323Z * [new tag] v2.4.0-rc1 -> v2.4.0-rc1 2025-09-07T06:13:53.4027183Z * [new tag] v2.4.0-rc2 -> v2.4.0-rc2 2025-09-07T06:13:53.4028125Z * [new tag] v2.4.0-rc3 -> v2.4.0-rc3 2025-09-07T06:13:53.4029041Z * [new tag] v2.4.0-rc4 -> v2.4.0-rc4 2025-09-07T06:13:53.4030060Z * [new tag] v2.4.0-rc5 -> v2.4.0-rc5 2025-09-07T06:13:53.4031063Z * [new tag] v2.4.0-rc6 -> v2.4.0-rc6 2025-09-07T06:13:53.4031981Z * [new tag] v2.4.0-rc7 -> v2.4.0-rc7 2025-09-07T06:13:53.4032869Z * [new tag] v2.4.0-rc8 -> v2.4.0-rc8 2025-09-07T06:13:53.4033897Z * [new tag] v2.4.0-rc9 -> v2.4.0-rc9 2025-09-07T06:13:53.4034604Z * [new tag] v2.4.1 -> v2.4.1 2025-09-07T06:13:53.4035602Z * [new tag] v2.4.1-rc1 -> v2.4.1-rc1 2025-09-07T06:13:53.4036607Z * [new tag] v2.4.1-rc2 -> v2.4.1-rc2 2025-09-07T06:13:53.4037611Z * [new tag] v2.4.1-rc3 -> v2.4.1-rc3 2025-09-07T06:13:53.4038520Z * [new tag] v2.5.0 -> v2.5.0 2025-09-07T06:13:53.4039529Z * [new tag] v2.5.0-rc1 -> v2.5.0-rc1 2025-09-07T06:13:53.4040170Z * [new tag] v2.5.0-rc10 -> v2.5.0-rc10 2025-09-07T06:13:53.4041293Z * [new tag] v2.5.0-rc2 -> v2.5.0-rc2 2025-09-07T06:13:53.4042235Z * [new tag] v2.5.0-rc3 -> v2.5.0-rc3 2025-09-07T06:13:53.4043184Z * [new tag] v2.5.0-rc4 -> v2.5.0-rc4 2025-09-07T06:13:53.4044119Z * [new tag] v2.5.0-rc5 -> v2.5.0-rc5 2025-09-07T06:13:53.4045228Z * [new tag] v2.5.0-rc6 -> v2.5.0-rc6 2025-09-07T06:13:53.4046178Z * [new tag] v2.5.0-rc7 -> v2.5.0-rc7 2025-09-07T06:13:53.4047109Z * [new tag] v2.5.0-rc8 -> v2.5.0-rc8 2025-09-07T06:13:53.4048089Z * [new tag] v2.5.0-rc9 -> v2.5.0-rc9 2025-09-07T06:13:53.4048908Z * [new tag] v2.5.1 -> v2.5.1 2025-09-07T06:13:53.4050006Z * [new tag] v2.5.1-rc1 -> v2.5.1-rc1 2025-09-07T06:13:53.4050687Z * [new tag] v2.6.0 -> v2.6.0 2025-09-07T06:13:53.4051898Z * [new tag] v2.6.0-rc1 -> v2.6.0-rc1 2025-09-07T06:13:53.4052995Z * [new tag] v2.6.0-rc2 -> v2.6.0-rc2 2025-09-07T06:13:53.4054139Z * [new tag] v2.6.0-rc3 -> v2.6.0-rc3 2025-09-07T06:13:53.4054894Z * [new tag] v2.6.0-rc4 -> v2.6.0-rc4 2025-09-07T06:13:53.4056199Z * [new tag] v2.6.0-rc5 -> v2.6.0-rc5 2025-09-07T06:13:53.4057281Z * [new tag] v2.6.0-rc6 -> v2.6.0-rc6 2025-09-07T06:13:53.4058345Z * [new tag] v2.6.0-rc7 -> v2.6.0-rc7 2025-09-07T06:13:53.4059970Z * [new tag] v2.6.0-rc8 -> v2.6.0-rc8 2025-09-07T06:13:53.4061001Z * [new tag] v2.6.0-rc9 -> v2.6.0-rc9 2025-09-07T06:13:53.4062221Z * [new tag] v2.7.0 -> v2.7.0 2025-09-07T06:13:53.4063196Z * [new tag] v2.7.0-rc1 -> v2.7.0-rc1 2025-09-07T06:13:53.4063960Z * [new tag] v2.7.0-rc10 -> v2.7.0-rc10 2025-09-07T06:13:53.4065109Z * [new tag] v2.7.0-rc2 -> v2.7.0-rc2 2025-09-07T06:13:53.4066053Z * [new tag] v2.7.0-rc3 -> v2.7.0-rc3 2025-09-07T06:13:53.4067062Z * [new tag] v2.7.0-rc4 -> v2.7.0-rc4 2025-09-07T06:13:53.4067913Z * [new tag] v2.7.0-rc5 -> v2.7.0-rc5 2025-09-07T06:13:53.4068854Z * [new tag] v2.7.0-rc6 -> v2.7.0-rc6 2025-09-07T06:13:53.4069829Z * [new tag] v2.7.0-rc7 -> v2.7.0-rc7 2025-09-07T06:13:53.4070880Z * [new tag] v2.7.0-rc8 -> v2.7.0-rc8 2025-09-07T06:13:53.4071852Z * [new tag] v2.7.0-rc9 -> v2.7.0-rc9 2025-09-07T06:13:53.4072477Z * [new tag] v2.7.1 -> v2.7.1 2025-09-07T06:13:53.4073690Z * [new tag] v2.7.1-rc1 -> v2.7.1-rc1 2025-09-07T06:13:53.4074682Z * [new tag] v2.7.1-rc2 -> v2.7.1-rc2 2025-09-07T06:13:53.4075702Z * [new tag] v2.7.1-rc3 -> v2.7.1-rc3 2025-09-07T06:13:53.4076729Z * [new tag] v2.7.1-rc4 -> v2.7.1-rc4 2025-09-07T06:13:53.4077639Z * [new tag] v2.7.1-rc5 -> v2.7.1-rc5 2025-09-07T06:13:53.4078359Z * [new tag] v2.8.0 -> v2.8.0 2025-09-07T06:13:53.4079355Z * [new tag] v2.8.0-rc1 -> v2.8.0-rc1 2025-09-07T06:13:53.4080358Z * [new tag] v2.8.0-rc2 -> v2.8.0-rc2 2025-09-07T06:13:53.4081402Z * [new tag] v2.8.0-rc3 -> v2.8.0-rc3 2025-09-07T06:13:53.4082434Z * [new tag] v2.8.0-rc4 -> v2.8.0-rc4 2025-09-07T06:13:53.4083391Z * [new tag] v2.8.0-rc5 -> v2.8.0-rc5 2025-09-07T06:13:53.4084383Z * [new tag] v2.8.0-rc6 -> v2.8.0-rc6 2025-09-07T06:13:53.4085353Z * [new tag] v2.8.0-rc7 -> v2.8.0-rc7 2025-09-07T06:13:53.4086301Z * [new tag] v2.8.0-rc8 -> v2.8.0-rc8 2025-09-07T06:13:53.4087267Z * [new tag] whc_flight_1 -> whc_flight_1 2025-09-07T06:13:53.4088236Z * [new tag] whc_flight_2 -> whc_flight_2 2025-09-07T06:13:53.4089015Z * [new tag] whc_flight_4 -> whc_flight_4 2025-09-07T06:13:53.4801359Z [command]/usr/bin/git rev-parse --verify --quiet 93fb23d6fae7c4e82c4239a1033e522088742634^{object} 2025-09-07T06:13:53.4828725Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:13:53.4830872Z ##[endgroup] 2025-09-07T06:13:53.4831175Z ##[group]Determining the checkout info 2025-09-07T06:13:53.4832279Z ##[endgroup] 2025-09-07T06:13:53.4836558Z [command]/usr/bin/git sparse-checkout disable 2025-09-07T06:13:53.4880317Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-09-07T06:13:53.4923060Z ##[group]Checking out the ref 2025-09-07T06:13:53.4928298Z [command]/usr/bin/git checkout --progress --force 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:13:54.5402082Z Updating files: 81% (15739/19405) 2025-09-07T06:13:54.5636686Z Updating files: 82% (15913/19405) 2025-09-07T06:13:54.5759009Z Updating files: 83% (16107/19405) 2025-09-07T06:13:54.5899670Z Updating files: 84% (16301/19405) 2025-09-07T06:13:54.6064766Z Updating files: 85% (16495/19405) 2025-09-07T06:13:54.6208138Z Updating files: 86% (16689/19405) 2025-09-07T06:13:54.6351027Z Updating files: 87% (16883/19405) 2025-09-07T06:13:54.6458838Z Updating files: 88% (17077/19405) 2025-09-07T06:13:54.6605857Z Updating files: 89% (17271/19405) 2025-09-07T06:13:54.6789007Z Updating files: 90% (17465/19405) 2025-09-07T06:13:54.6906052Z Updating files: 91% (17659/19405) 2025-09-07T06:13:54.7053098Z Updating files: 92% (17853/19405) 2025-09-07T06:13:54.7247319Z Updating files: 93% (18047/19405) 2025-09-07T06:13:54.7461142Z Updating files: 94% (18241/19405) 2025-09-07T06:13:54.7622825Z Updating files: 95% (18435/19405) 2025-09-07T06:13:54.7790224Z Updating files: 96% (18629/19405) 2025-09-07T06:13:54.7976008Z Updating files: 97% (18823/19405) 2025-09-07T06:13:54.8252255Z Updating files: 98% (19017/19405) 2025-09-07T06:13:54.8413115Z Updating files: 99% (19211/19405) 2025-09-07T06:13:54.8413517Z Updating files: 100% (19405/19405) 2025-09-07T06:13:54.8413872Z Updating files: 100% (19405/19405), done. 2025-09-07T06:13:54.8693365Z Note: switching to '93fb23d6fae7c4e82c4239a1033e522088742634'. 2025-09-07T06:13:54.8693769Z 2025-09-07T06:13:54.8694025Z You are in 'detached HEAD' state. You can look around, make experimental 2025-09-07T06:13:54.8694673Z changes and commit them, and you can discard any commits you make in this 2025-09-07T06:13:54.8695304Z state without impacting any branches by switching back to a branch. 2025-09-07T06:13:54.8695676Z 2025-09-07T06:13:54.8695951Z If you want to create a new branch to retain commits you create, you may 2025-09-07T06:13:54.8696522Z do so (now or later) by using -c with the switch command. Example: 2025-09-07T06:13:54.8696867Z 2025-09-07T06:13:54.8697009Z git switch -c 2025-09-07T06:13:54.8697236Z 2025-09-07T06:13:54.8697373Z Or undo this operation with: 2025-09-07T06:13:54.8697578Z 2025-09-07T06:13:54.8697680Z git switch - 2025-09-07T06:13:54.8697828Z 2025-09-07T06:13:54.8698113Z Turn off this advice by setting config variable advice.detachedHead to false 2025-09-07T06:13:54.8698515Z 2025-09-07T06:13:54.8698717Z HEAD is now at 93fb23d6fae Build vLLM nightly wheels (#162000) 2025-09-07T06:13:54.8783214Z ##[endgroup] 2025-09-07T06:13:54.8824752Z [command]/usr/bin/git log -1 --format=%H 2025-09-07T06:13:54.8851003Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:13:54.8952315Z ##[group]Run cd "${GITHUB_WORKSPACE}" 2025-09-07T06:13:54.8952725Z cd "${GITHUB_WORKSPACE}" 2025-09-07T06:13:54.8953090Z # Clean stale submodule dirs 2025-09-07T06:13:54.8953462Z if [ -z "${NO_SUDO}" ]; then 2025-09-07T06:13:54.8953908Z  sudo git submodule foreach --recursive git clean -ffdx 2025-09-07T06:13:54.8954351Z else 2025-09-07T06:13:54.8954691Z  git submodule foreach --recursive git clean -ffdx 2025-09-07T06:13:54.8955095Z fi 2025-09-07T06:13:54.8964682Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:54.8965083Z env: 2025-09-07T06:13:54.8965309Z PY_VERS: 3.12 2025-09-07T06:13:54.8965627Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:54.8966050Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:54.8966354Z BUILD_DEVICE: cu129 2025-09-07T06:13:54.8966614Z NO_SUDO: 2025-09-07T06:13:54.8966827Z ##[endgroup] 2025-09-07T06:13:55.4126484Z Prepare all required actions 2025-09-07T06:13:55.4127112Z Getting action download info 2025-09-07T06:13:55.5293416Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-09-07T06:13:55.8768295Z ##[group]Run ./.github/actions/setup-linux 2025-09-07T06:13:55.8768628Z env: 2025-09-07T06:13:55.8768849Z PY_VERS: 3.12 2025-09-07T06:13:55.8769181Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:55.8769598Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:55.8769897Z BUILD_DEVICE: cu129 2025-09-07T06:13:55.8770158Z ##[endgroup] 2025-09-07T06:13:55.8823055Z ##[group]Run set -euo pipefail 2025-09-07T06:13:55.8823442Z set -euo pipefail 2025-09-07T06:13:55.8823769Z function get_ec2_metadata() { 2025-09-07T06:13:55.8824304Z  # Pulled from instance metadata endpoint for EC2 2025-09-07T06:13:55.8824988Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2025-09-07T06:13:55.8825555Z  category=$1 2025-09-07T06:13:55.8825922Z  # If it is GCP runner (runner name contains gcp), do not run this 2025-09-07T06:13:55.8826370Z  runner_name_str=i-04d2b1bca56e299e2 2025-09-07T06:13:55.8826701Z  if [[ -f /.inarc ]]; then 2025-09-07T06:13:55.8827048Z  echo "ARC Runner, no info on ec2 metadata" 2025-09-07T06:13:55.8827571Z  elif [[ $runner_name_str == *"gcp"* ]]; then 2025-09-07T06:13:55.8828052Z  echo "Runner is from Google Cloud Platform, No info on ec2 metadata" 2025-09-07T06:13:55.8828478Z  else 2025-09-07T06:13:55.8829360Z  curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2025-09-07T06:13:55.8830301Z  fi 2025-09-07T06:13:55.8830505Z } 2025-09-07T06:13:55.8830778Z echo "ami-id: $(get_ec2_metadata ami-id)" 2025-09-07T06:13:55.8831202Z echo "instance-id: $(get_ec2_metadata instance-id)" 2025-09-07T06:13:55.8831691Z echo "instance-type: $(get_ec2_metadata instance-type)" 2025-09-07T06:13:55.8832100Z echo "system info $(uname -a)" 2025-09-07T06:13:55.8838266Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:55.8838644Z env: 2025-09-07T06:13:55.8838850Z PY_VERS: 3.12 2025-09-07T06:13:55.8839158Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:55.8839536Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:55.8839834Z BUILD_DEVICE: cu129 2025-09-07T06:13:55.8840069Z ##[endgroup] 2025-09-07T06:13:55.9006014Z ami-id: ami-05ffe3c48a9991133 2025-09-07T06:13:55.9133033Z instance-id: i-04d2b1bca56e299e2 2025-09-07T06:13:55.9245309Z instance-type: r5.12xlarge 2025-09-07T06:13:55.9258177Z system info Linux ip-10-0-63-32.ec2.internal 6.1.141-155.222.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jun 17 10:29:47 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-09-07T06:13:55.9288105Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:13:55.9289133Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:13:55.9296603Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:55.9297027Z env: 2025-09-07T06:13:55.9297251Z PY_VERS: 3.12 2025-09-07T06:13:55.9297600Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:55.9298032Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:55.9298364Z BUILD_DEVICE: cu129 2025-09-07T06:13:55.9298637Z ##[endgroup] 2025-09-07T06:13:55.9367273Z ##[group]Run if systemctl is-active --quiet docker; then 2025-09-07T06:13:55.9367783Z if systemctl is-active --quiet docker; then 2025-09-07T06:13:55.9368364Z  echo "Docker daemon is running..."; 2025-09-07T06:13:55.9368730Z else 2025-09-07T06:13:55.9369113Z  echo "Starting docker daemon..." && sudo systemctl start docker; 2025-09-07T06:13:55.9369590Z fi 2025-09-07T06:13:55.9376136Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:55.9376575Z env: 2025-09-07T06:13:55.9376810Z PY_VERS: 3.12 2025-09-07T06:13:55.9377158Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:55.9377602Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:55.9377922Z BUILD_DEVICE: cu129 2025-09-07T06:13:55.9378200Z ##[endgroup] 2025-09-07T06:13:55.9499570Z Docker daemon is running... 2025-09-07T06:13:55.9549492Z ##[group]Run nick-fields/retry@v3.0.0 2025-09-07T06:13:55.9549962Z with: 2025-09-07T06:13:55.9550199Z shell: bash 2025-09-07T06:13:55.9550438Z timeout_minutes: 5 2025-09-07T06:13:55.9550716Z max_attempts: 3 2025-09-07T06:13:55.9550971Z retry_wait_seconds: 30 2025-09-07T06:13:55.9553641Z command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" # For LF Runners we need to make sure we also login to Meta's ECR docker registry too. META_AWS_ACCOUNT_ID=308535385114 if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" fi 2025-09-07T06:13:55.9556512Z polling_interval_seconds: 1 2025-09-07T06:13:55.9556837Z warning_on_retry: true 2025-09-07T06:13:55.9557125Z continue_on_error: false 2025-09-07T06:13:55.9557413Z env: 2025-09-07T06:13:55.9557628Z PY_VERS: 3.12 2025-09-07T06:13:55.9557978Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:55.9558410Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:55.9558745Z BUILD_DEVICE: cu129 2025-09-07T06:13:55.9559018Z AWS_RETRY_MODE: standard 2025-09-07T06:13:55.9559316Z AWS_MAX_ATTEMPTS: 5 2025-09-07T06:13:55.9559612Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:13:55.9559912Z ##[endgroup] 2025-09-07T06:13:57.1543060Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-09-07T06:13:57.1543912Z Configure a credential helper to remove this warning. See 2025-09-07T06:13:57.1544549Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-09-07T06:13:57.1544997Z 2025-09-07T06:13:57.1545095Z Login Succeeded 2025-09-07T06:13:58.0467960Z Command completed after 1 attempt(s). 2025-09-07T06:13:58.0534384Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:13:58.0535012Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:13:58.0535569Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:13:58.0543907Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:58.0544320Z env: 2025-09-07T06:13:58.0544541Z PY_VERS: 3.12 2025-09-07T06:13:58.0544874Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:58.0545269Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:58.0545582Z BUILD_DEVICE: cu129 2025-09-07T06:13:58.0545849Z ##[endgroup] 2025-09-07T06:13:58.0657102Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T06:13:58.0657793Z # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T06:13:58.0658286Z # shellcheck disable=SC2046 2025-09-07T06:13:58.0658686Z docker stop $(docker ps -q) || true 2025-09-07T06:13:58.0659105Z # Prune all of the docker images 2025-09-07T06:13:58.0659480Z docker system prune -af 2025-09-07T06:13:58.0665721Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:58.0666124Z env: 2025-09-07T06:13:58.0666552Z PY_VERS: 3.12 2025-09-07T06:13:58.0666869Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:58.0667277Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:58.0667573Z BUILD_DEVICE: cu129 2025-09-07T06:13:58.0667840Z ##[endgroup] 2025-09-07T06:13:58.1179182Z "docker stop" requires at least 1 argument. 2025-09-07T06:13:58.1179642Z See 'docker stop --help'. 2025-09-07T06:13:58.1179870Z 2025-09-07T06:13:58.1180059Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-09-07T06:13:58.1180375Z 2025-09-07T06:13:58.1180515Z Stop one or more running containers 2025-09-07T06:13:58.1420860Z Total reclaimed space: 0B 2025-09-07T06:13:58.1463451Z ##[group]Run set +e 2025-09-07T06:13:58.1463749Z set +e 2025-09-07T06:13:58.1463989Z set -x 2025-09-07T06:13:58.1464205Z  2025-09-07T06:13:58.1464463Z PT_DOMAIN=download.pytorch.org 2025-09-07T06:13:58.1465071Z # TODO: Flaky access to download.pytorch.org https://github.com/pytorch/pytorch/issues/100400, 2025-09-07T06:13:58.1465860Z # cleaning this up once the issue is fixed. There are more than one resolved IP here, the last 2025-09-07T06:13:58.1466414Z # one is returned at random 2025-09-07T06:13:58.1466807Z RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" | tail -n1) 2025-09-07T06:13:58.1467197Z  2025-09-07T06:13:58.1467421Z if [ -z "${RESOLVED_IP}" ]; then 2025-09-07T06:13:58.1468006Z  echo "Couldn't resolve ${PT_DOMAIN}, retrying with Google DNS..." 2025-09-07T06:13:58.1468536Z  RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" @8.8.8.8 | tail -n1) 2025-09-07T06:13:58.1468944Z  2025-09-07T06:13:58.1469190Z  if [ -z "${RESOLVED_IP}" ]; then 2025-09-07T06:13:58.1469574Z  echo "Couldn't resolve ${PT_DOMAIN}, exiting..." 2025-09-07T06:13:58.1469949Z  exit 1 2025-09-07T06:13:58.1470176Z  fi 2025-09-07T06:13:58.1470397Z fi 2025-09-07T06:13:58.1470605Z  2025-09-07T06:13:58.1470883Z if grep -r "${PT_DOMAIN}" /etc/hosts; then 2025-09-07T06:13:58.1471253Z  # Clean up any old records first 2025-09-07T06:13:58.1471633Z  sudo sed -i "/${PT_DOMAIN}/d" /etc/hosts 2025-09-07T06:13:58.1471972Z fi 2025-09-07T06:13:58.1472173Z  2025-09-07T06:13:58.1472490Z echo "${RESOLVED_IP} ${PT_DOMAIN}" | sudo tee -a /etc/hosts 2025-09-07T06:13:58.1472893Z cat /etc/hosts 2025-09-07T06:13:58.1478573Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:58.1478945Z env: 2025-09-07T06:13:58.1479159Z PY_VERS: 3.12 2025-09-07T06:13:58.1479461Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:58.1479858Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:58.1480148Z BUILD_DEVICE: cu129 2025-09-07T06:13:58.1480404Z ##[endgroup] 2025-09-07T06:13:58.1508353Z + PT_DOMAIN=download.pytorch.org 2025-09-07T06:13:58.1513886Z ++ dig -4 +short download.pytorch.org 2025-09-07T06:13:58.1515044Z ++ tail -n1 2025-09-07T06:13:58.1935733Z + RESOLVED_IP=18.160.10.36 2025-09-07T06:13:58.1936081Z + '[' -z 18.160.10.36 ']' 2025-09-07T06:13:58.1936404Z + grep -r download.pytorch.org /etc/hosts 2025-09-07T06:13:58.1953300Z + echo '18.160.10.36 download.pytorch.org' 2025-09-07T06:13:58.1954079Z + sudo tee -a /etc/hosts 2025-09-07T06:13:58.3658545Z 18.160.10.36 download.pytorch.org 2025-09-07T06:13:58.3677343Z + cat /etc/hosts 2025-09-07T06:13:58.3688027Z 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 2025-09-07T06:13:58.3696158Z ::1 localhost6 localhost6.localdomain6 2025-09-07T06:13:58.3696847Z 18.160.10.36 download.pytorch.org 2025-09-07T06:13:58.3744335Z ##[group]Run set -eux 2025-09-07T06:13:58.3744737Z set -eux 2025-09-07T06:13:58.3744999Z  2025-09-07T06:13:58.3745403Z # Keep PyTorch nightly wheel here so that we can install it later during 2025-09-07T06:13:58.3745940Z # vLLM build process 2025-09-07T06:13:58.3746300Z mkdir -p "${RUNNER_TEMP}/artifacts/" 2025-09-07T06:13:58.3746651Z  2025-09-07T06:13:58.3746912Z container_name=$(docker run \ 2025-09-07T06:13:58.3747245Z  --tty \ 2025-09-07T06:13:58.3747519Z  --detach \ 2025-09-07T06:13:58.3747791Z  -e PLATFORM \ 2025-09-07T06:13:58.3748126Z  -v "${GITHUB_WORKSPACE}:/pytorch" \ 2025-09-07T06:13:58.3748558Z  -v "${RUNNER_TEMP}/artifacts:/artifacts" \ 2025-09-07T06:13:58.3749356Z  -w /artifacts/ \ 2025-09-07T06:13:58.3749689Z  "${MANYLINUX_IMAGE}" 2025-09-07T06:13:58.3749999Z ) 2025-09-07T06:13:58.3750287Z  2025-09-07T06:13:58.3750746Z # Determine python executable for given version (copied from build-triton-wheel) 2025-09-07T06:13:58.3751337Z case $PY_VERS in 2025-09-07T06:13:58.3751643Z 3.10) 2025-09-07T06:13:58.3751994Z  PYTHON_EXECUTABLE=/opt/python/cp310-cp310/bin/python 2025-09-07T06:13:58.3752435Z  ;; 2025-09-07T06:13:58.3752673Z 3.11) 2025-09-07T06:13:58.3753039Z  PYTHON_EXECUTABLE=/opt/python/cp311-cp311/bin/python 2025-09-07T06:13:58.3753466Z  ;; 2025-09-07T06:13:58.3753717Z 3.12) 2025-09-07T06:13:58.3754062Z  PYTHON_EXECUTABLE=/opt/python/cp312-cp312/bin/python 2025-09-07T06:13:58.3754701Z  ;; 2025-09-07T06:13:58.3754955Z 3.13) 2025-09-07T06:13:58.3755300Z  PYTHON_EXECUTABLE=/opt/python/cp313-cp313/bin/python 2025-09-07T06:13:58.3755738Z  ;; 2025-09-07T06:13:58.3755977Z 3.13t) 2025-09-07T06:13:58.3756359Z  PYTHON_EXECUTABLE=/opt/python/cp313-cp313t/bin/python 2025-09-07T06:13:58.3756787Z  ;; 2025-09-07T06:13:58.3757039Z 3.14) 2025-09-07T06:13:58.3757385Z  PYTHON_EXECUTABLE=/opt/python/cp314-cp314/bin/python 2025-09-07T06:13:58.3757828Z  ;; 2025-09-07T06:13:58.3758075Z 3.14t) 2025-09-07T06:13:58.3758430Z  PYTHON_EXECUTABLE=/opt/python/cp314-cp314t/bin/python 2025-09-07T06:13:58.3758873Z  ;; 2025-09-07T06:13:58.3759106Z *) 2025-09-07T06:13:58.3759429Z  echo "Unsupported python version ${PY_VERS}" 2025-09-07T06:13:58.3759819Z  exit 1 2025-09-07T06:13:58.3760077Z  ;; 2025-09-07T06:13:58.3760317Z esac 2025-09-07T06:13:58.3760563Z  2025-09-07T06:13:58.3760985Z docker exec -t "${container_name}" "${PYTHON_EXECUTABLE}" -mpip install \ 2025-09-07T06:13:58.3761546Z  --pre torch torchvision torchaudio \ 2025-09-07T06:13:58.3762255Z  --index-url "https://download.pytorch.org/whl/nightly/${BUILD_DEVICE}" 2025-09-07T06:13:58.3762756Z  2025-09-07T06:13:58.3763149Z # I wonder if there is a command to both download and install the wheels 2025-09-07T06:13:58.3763627Z # in one go 2025-09-07T06:13:58.3764073Z docker exec -t "${container_name}" "${PYTHON_EXECUTABLE}" -mpip download \ 2025-09-07T06:13:58.3764634Z  --pre torch torchvision torchaudio \ 2025-09-07T06:13:58.3765175Z  --index-url "https://download.pytorch.org/whl/nightly/${BUILD_DEVICE}" 2025-09-07T06:13:58.3765687Z  2025-09-07T06:13:58.3765924Z # Save this for later 2025-09-07T06:13:58.3766377Z echo "PYTHON_EXECUTABLE=${PYTHON_EXECUTABLE}" >> "$GITHUB_ENV" 2025-09-07T06:13:58.3766953Z echo "container_name=${container_name}" >> "$GITHUB_ENV" 2025-09-07T06:13:58.3775998Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:58.3776427Z env: 2025-09-07T06:13:58.3776874Z PY_VERS: 3.12 2025-09-07T06:13:58.3777225Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:58.3777679Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:58.3778025Z BUILD_DEVICE: cu129 2025-09-07T06:13:58.3778290Z ##[endgroup] 2025-09-07T06:13:58.3807974Z + mkdir -p /home/ec2-user/actions-runner/_work/_temp/artifacts/ 2025-09-07T06:13:58.3829212Z ++ docker run --tty --detach -e PLATFORM -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/pytorch -v /home/ec2-user/actions-runner/_work/_temp/artifacts:/artifacts -w /artifacts/ pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:13:58.4024880Z Unable to find image 'pytorch/manylinux2_28-builder:cuda12.9' locally 2025-09-07T06:13:58.6062222Z cuda12.9: Pulling from pytorch/manylinux2_28-builder 2025-09-07T06:13:58.6062798Z 401a23685cb3: Pulling fs layer 2025-09-07T06:13:58.6063140Z 52c3d198c9af: Pulling fs layer 2025-09-07T06:13:58.6063661Z fe278dc4d088: Pulling fs layer 2025-09-07T06:13:58.6063993Z 0783f967a46a: Pulling fs layer 2025-09-07T06:13:58.6064370Z 86349c7ebddf: Pulling fs layer 2025-09-07T06:13:58.6064670Z 90c23f3bfb32: Pulling fs layer 2025-09-07T06:13:58.6065051Z 1a1e757bc7fb: Pulling fs layer 2025-09-07T06:13:58.6065354Z 53d7e1600e77: Pulling fs layer 2025-09-07T06:13:58.6065667Z 47ea95eb3fe6: Pulling fs layer 2025-09-07T06:13:58.6066051Z 57daab4e44d7: Pulling fs layer 2025-09-07T06:13:58.6066351Z 94ec369fe56c: Pulling fs layer 2025-09-07T06:13:58.6066729Z db7c21e666ca: Pulling fs layer 2025-09-07T06:13:58.6067017Z 0783f967a46a: Waiting 2025-09-07T06:13:58.6067350Z 0b85a8e30754: Pulling fs layer 2025-09-07T06:13:58.6067662Z 12ad6180f0e8: Pulling fs layer 2025-09-07T06:13:58.6067975Z 7f82a06543eb: Pulling fs layer 2025-09-07T06:13:58.6068538Z 90c23f3bfb32: Waiting 2025-09-07T06:13:58.6068859Z 9807f368c615: Pulling fs layer 2025-09-07T06:13:58.6069164Z 86349c7ebddf: Waiting 2025-09-07T06:13:58.6069437Z 2d9a73894f57: Pulling fs layer 2025-09-07T06:13:58.6069725Z 1a1e757bc7fb: Waiting 2025-09-07T06:13:58.6070009Z 16900ac4285e: Pulling fs layer 2025-09-07T06:13:58.6070303Z 53d7e1600e77: Waiting 2025-09-07T06:13:58.6070554Z 47ea95eb3fe6: Waiting 2025-09-07T06:13:58.6070831Z bb61d5c1296d: Pulling fs layer 2025-09-07T06:13:58.6071128Z d305e6620c76: Pulling fs layer 2025-09-07T06:13:58.6071438Z db7c21e666ca: Waiting 2025-09-07T06:13:58.6071858Z 6a05569333cb: Pulling fs layer 2025-09-07T06:13:58.6072159Z 57daab4e44d7: Waiting 2025-09-07T06:13:58.6072489Z f4b9c70d03b6: Pulling fs layer 2025-09-07T06:13:58.6072792Z 0b85a8e30754: Waiting 2025-09-07T06:13:58.6073077Z e5d000fec9d3: Pulling fs layer 2025-09-07T06:13:58.6073438Z 2d9a73894f57: Waiting 2025-09-07T06:13:58.6073706Z 94ec369fe56c: Waiting 2025-09-07T06:13:58.6074046Z 4f4fb700ef54: Pulling fs layer 2025-09-07T06:13:58.6074368Z 5aa748bf74cb: Pulling fs layer 2025-09-07T06:13:58.6074718Z 6ad951c51ccf: Pulling fs layer 2025-09-07T06:13:58.6075053Z 95a51416d539: Pulling fs layer 2025-09-07T06:13:58.6075336Z 16900ac4285e: Waiting 2025-09-07T06:13:58.6075664Z 9807f368c615: Waiting 2025-09-07T06:13:58.6075927Z bb61d5c1296d: Waiting 2025-09-07T06:13:58.6076196Z e5d000fec9d3: Waiting 2025-09-07T06:13:58.6076447Z 6a05569333cb: Waiting 2025-09-07T06:13:58.6076709Z d305e6620c76: Waiting 2025-09-07T06:13:58.6076961Z f4b9c70d03b6: Waiting 2025-09-07T06:13:58.6077230Z 12ad6180f0e8: Waiting 2025-09-07T06:13:58.6077497Z 4f4fb700ef54: Waiting 2025-09-07T06:13:58.6077750Z 6ad951c51ccf: Waiting 2025-09-07T06:13:58.6078017Z 5aa748bf74cb: Waiting 2025-09-07T06:13:59.2055705Z fe278dc4d088: Verifying Checksum 2025-09-07T06:13:59.2056102Z fe278dc4d088: Download complete 2025-09-07T06:13:59.3427458Z 401a23685cb3: Download complete 2025-09-07T06:13:59.4161991Z 86349c7ebddf: Download complete 2025-09-07T06:13:59.7237485Z 90c23f3bfb32: Verifying Checksum 2025-09-07T06:13:59.7238128Z 90c23f3bfb32: Download complete 2025-09-07T06:13:59.7672356Z 52c3d198c9af: Verifying Checksum 2025-09-07T06:13:59.7672763Z 52c3d198c9af: Download complete 2025-09-07T06:13:59.8214293Z 53d7e1600e77: Download complete 2025-09-07T06:13:59.8485312Z 1a1e757bc7fb: Verifying Checksum 2025-09-07T06:13:59.8485703Z 1a1e757bc7fb: Download complete 2025-09-07T06:13:59.9266481Z 57daab4e44d7: Download complete 2025-09-07T06:14:01.2596805Z 401a23685cb3: Pull complete 2025-09-07T06:14:01.5992469Z 0783f967a46a: Verifying Checksum 2025-09-07T06:14:01.7235939Z 0783f967a46a: Download complete 2025-09-07T06:14:01.7236341Z db7c21e666ca: Download complete 2025-09-07T06:14:01.8569061Z 0b85a8e30754: Verifying Checksum 2025-09-07T06:14:01.8569445Z 0b85a8e30754: Download complete 2025-09-07T06:14:01.9036502Z 52c3d198c9af: Pull complete 2025-09-07T06:14:01.9559622Z 12ad6180f0e8: Download complete 2025-09-07T06:14:01.9661163Z 47ea95eb3fe6: Verifying Checksum 2025-09-07T06:14:01.9661584Z 47ea95eb3fe6: Download complete 2025-09-07T06:14:02.0374499Z 7f82a06543eb: Download complete 2025-09-07T06:14:02.0734156Z 9807f368c615: Verifying Checksum 2025-09-07T06:14:02.0734541Z 9807f368c615: Download complete 2025-09-07T06:14:02.1764929Z 16900ac4285e: Download complete 2025-09-07T06:14:02.2621460Z bb61d5c1296d: Verifying Checksum 2025-09-07T06:14:02.2621867Z bb61d5c1296d: Download complete 2025-09-07T06:14:02.3565871Z 2d9a73894f57: Verifying Checksum 2025-09-07T06:14:02.3566296Z 2d9a73894f57: Download complete 2025-09-07T06:14:02.3919118Z d305e6620c76: Verifying Checksum 2025-09-07T06:14:02.3919507Z d305e6620c76: Download complete 2025-09-07T06:14:02.7720518Z 94ec369fe56c: Verifying Checksum 2025-09-07T06:14:02.7720922Z 94ec369fe56c: Download complete 2025-09-07T06:14:02.8803018Z f4b9c70d03b6: Verifying Checksum 2025-09-07T06:14:02.8803426Z f4b9c70d03b6: Download complete 2025-09-07T06:14:02.9017341Z 4f4fb700ef54: Verifying Checksum 2025-09-07T06:14:02.9017729Z 4f4fb700ef54: Download complete 2025-09-07T06:14:03.0205981Z fe278dc4d088: Pull complete 2025-09-07T06:14:03.3166596Z e5d000fec9d3: Verifying Checksum 2025-09-07T06:14:03.3167004Z e5d000fec9d3: Download complete 2025-09-07T06:14:05.1313252Z 6ad951c51ccf: Verifying Checksum 2025-09-07T06:14:05.1313676Z 6ad951c51ccf: Download complete 2025-09-07T06:14:05.2129731Z 95a51416d539: Verifying Checksum 2025-09-07T06:14:05.2130109Z 95a51416d539: Download complete 2025-09-07T06:14:07.2194322Z 0783f967a46a: Pull complete 2025-09-07T06:14:07.2413084Z 86349c7ebddf: Pull complete 2025-09-07T06:14:07.4062814Z 90c23f3bfb32: Pull complete 2025-09-07T06:14:07.4293357Z 1a1e757bc7fb: Pull complete 2025-09-07T06:14:07.4529637Z 53d7e1600e77: Pull complete 2025-09-07T06:14:08.1295637Z 6a05569333cb: Verifying Checksum 2025-09-07T06:14:08.1296056Z 6a05569333cb: Download complete 2025-09-07T06:14:15.0827898Z 47ea95eb3fe6: Pull complete 2025-09-07T06:14:15.1058964Z 57daab4e44d7: Pull complete 2025-09-07T06:14:18.6171855Z 94ec369fe56c: Pull complete 2025-09-07T06:14:18.6461388Z db7c21e666ca: Pull complete 2025-09-07T06:14:18.6714289Z 0b85a8e30754: Pull complete 2025-09-07T06:14:18.6931630Z 12ad6180f0e8: Pull complete 2025-09-07T06:14:18.7176643Z 7f82a06543eb: Pull complete 2025-09-07T06:14:18.7421827Z 9807f368c615: Pull complete 2025-09-07T06:14:18.8017484Z 2d9a73894f57: Pull complete 2025-09-07T06:14:18.8256912Z 16900ac4285e: Pull complete 2025-09-07T06:14:18.8474193Z bb61d5c1296d: Pull complete 2025-09-07T06:14:18.8688415Z d305e6620c76: Pull complete 2025-09-07T06:14:30.2504149Z 6a05569333cb: Pull complete 2025-09-07T06:14:31.2164673Z f4b9c70d03b6: Pull complete 2025-09-07T06:14:32.1060471Z e5d000fec9d3: Pull complete 2025-09-07T06:14:32.1312219Z 4f4fb700ef54: Pull complete 2025-09-07T06:15:31.0942900Z 5aa748bf74cb: Verifying Checksum 2025-09-07T06:15:31.0945178Z 5aa748bf74cb: Download complete 2025-09-07T06:16:48.0695333Z 5aa748bf74cb: Pull complete 2025-09-07T06:16:50.2922051Z 6ad951c51ccf: Pull complete 2025-09-07T06:16:50.7807352Z 95a51416d539: Pull complete 2025-09-07T06:16:51.0707766Z Digest: sha256:af68b954a0a5df04f9f2d7d0181ee3340dec6e378acf0db77a7d3b61d2ecc3aa 2025-09-07T06:16:51.1795143Z Status: Downloaded newer image for pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:18:03.8133667Z + container_name=80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 2025-09-07T06:18:03.8135735Z + case $PY_VERS in 2025-09-07T06:18:03.8136134Z + PYTHON_EXECUTABLE=/opt/python/cp312-cp312/bin/python 2025-09-07T06:18:03.8137572Z + docker exec -t 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 /opt/python/cp312-cp312/bin/python -mpip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129 2025-09-07T06:18:04.2821545Z Looking in indexes: https://download.pytorch.org/whl/nightly/cu129 2025-09-07T06:18:04.4837408Z Collecting torch 2025-09-07T06:18:04.4884133Z Downloading https://download.pytorch.org/whl/nightly/cu129/torch-2.9.0.dev20250905%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB) 2025-09-07T06:18:04.6281630Z Collecting torchvision 2025-09-07T06:18:04.6574867Z Downloading https://download.pytorch.org/whl/nightly/cu129/torchvision-0.24.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (5.9 kB) 2025-09-07T06:18:04.8014579Z Collecting torchaudio 2025-09-07T06:18:04.8674667Z Downloading https://download.pytorch.org/whl/nightly/cu129/torchaudio-2.8.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (7.3 kB) 2025-09-07T06:18:04.9792566Z Collecting filelock (from torch) 2025-09-07T06:18:04.9832804Z Downloading https://download.pytorch.org/whl/nightly/filelock-3.19.1-py3-none-any.whl.metadata (2.1 kB) 2025-09-07T06:18:05.0271738Z Collecting typing-extensions>=4.10.0 (from torch) 2025-09-07T06:18:05.0308029Z Downloading https://download.pytorch.org/whl/nightly/typing_extensions-4.14.1-py3-none-any.whl.metadata (3.0 kB) 2025-09-07T06:18:05.0387782Z Requirement already satisfied: setuptools in /opt/python/cp312-cp312/lib/python3.12/site-packages (from torch) (80.9.0) 2025-09-07T06:18:05.0679563Z Collecting sympy>=1.13.3 (from torch) 2025-09-07T06:18:05.0713669Z Downloading https://download.pytorch.org/whl/nightly/sympy-1.14.0-py3-none-any.whl.metadata (12 kB) 2025-09-07T06:18:05.1174098Z Collecting networkx>=2.5.1 (from torch) 2025-09-07T06:18:05.1214218Z Downloading https://download.pytorch.org/whl/nightly/networkx-3.5-py3-none-any.whl.metadata (6.3 kB) 2025-09-07T06:18:05.1637333Z Collecting jinja2 (from torch) 2025-09-07T06:18:05.1675222Z Downloading https://download.pytorch.org/whl/nightly/jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB) 2025-09-07T06:18:05.2009925Z Collecting fsspec>=0.8.5 (from torch) 2025-09-07T06:18:05.2044574Z Downloading https://download.pytorch.org/whl/nightly/fsspec-2025.7.0-py3-none-any.whl.metadata (12 kB) 2025-09-07T06:18:05.2681503Z Collecting nvidia-cuda-nvrtc-cu12==12.9.86 (from torch) 2025-09-07T06:18:05.2721331Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:18:05.3122633Z Collecting nvidia-cuda-runtime-cu12==12.9.79 (from torch) 2025-09-07T06:18:05.3163861Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:18:05.3811727Z Collecting nvidia-cuda-cupti-cu12==12.9.79 (from torch) 2025-09-07T06:18:05.3852466Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:18:05.4144922Z Collecting nvidia-cudnn-cu12==9.10.2.21 (from torch) 2025-09-07T06:18:05.4195530Z Downloading https://download.pytorch.org/whl/nightly/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:18:05.4486162Z Collecting nvidia-cublas-cu12==12.9.1.4 (from torch) 2025-09-07T06:18:05.4524821Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:18:05.4809159Z Collecting nvidia-cufft-cu12==11.4.1.4 (from torch) 2025-09-07T06:18:05.4852787Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:18:05.5156991Z Collecting nvidia-curand-cu12==10.3.10.19 (from torch) 2025-09-07T06:18:05.5203084Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:18:05.5759585Z Collecting nvidia-cusolver-cu12==11.7.5.82 (from torch) 2025-09-07T06:18:05.5797775Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl.metadata (1.9 kB) 2025-09-07T06:18:05.6247002Z Collecting nvidia-cusparse-cu12==12.5.10.65 (from torch) 2025-09-07T06:18:05.6283925Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:18:05.6646607Z Collecting nvidia-cusparselt-cu12==0.7.1 (from torch) 2025-09-07T06:18:05.6686016Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl.metadata (7.0 kB) 2025-09-07T06:18:05.7129145Z Collecting nvidia-nccl-cu12==2.27.5 (from torch) 2025-09-07T06:18:05.7165999Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) 2025-09-07T06:18:05.7648129Z Collecting nvidia-nvshmem-cu12==3.3.20 (from torch) 2025-09-07T06:18:05.7694680Z Downloading https://download.pytorch.org/whl/nightly/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.1 kB) 2025-09-07T06:18:05.8073607Z Collecting nvidia-nvtx-cu12==12.9.79 (from torch) 2025-09-07T06:18:05.8113934Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:18:05.8590287Z Collecting nvidia-nvjitlink-cu12==12.9.86 (from torch) 2025-09-07T06:18:05.8631704Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:18:05.9339651Z Collecting nvidia-cufile-cu12==1.14.1.1 (from torch) 2025-09-07T06:18:05.9378670Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:18:06.0151980Z Collecting pytorch-triton==3.4.0+gitf7888497 (from torch) 2025-09-07T06:18:06.0195278Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.4.0%2Bgitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:18:06.1079132Z Collecting numpy (from torchvision) 2025-09-07T06:18:06.1124135Z Downloading https://download.pytorch.org/whl/nightly/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB) 2025-09-07T06:18:06.1333932Z Collecting torch 2025-09-07T06:18:06.1380852Z Downloading https://download.pytorch.org/whl/nightly/cu129/torch-2.9.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB) 2025-09-07T06:18:06.2165388Z Collecting pillow!=8.3.*,>=5.3.0 (from torchvision) 2025-09-07T06:18:06.2213796Z Downloading https://download.pytorch.org/whl/nightly/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (9.0 kB) 2025-09-07T06:18:06.2752699Z Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch) 2025-09-07T06:18:06.2823862Z Downloading https://download.pytorch.org/whl/nightly/mpmath-1.3.0-py3-none-any.whl (536 kB) 2025-09-07T06:18:06.2951399Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/536.2 kB ? eta -:--:-- 2025-09-07T06:18:06.2952316Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 31.9 MB/s 0:00:00 2025-09-07T06:18:06.3854958Z [?25hCollecting MarkupSafe>=2.0 (from jinja2->torch) 2025-09-07T06:18:06.3893588Z Downloading https://download.pytorch.org/whl/nightly/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB) 2025-09-07T06:18:06.4044053Z Downloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl (581.2 MB) 2025-09-07T06:18:06.6075571Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/581.2 MB ? eta -:--:-- 2025-09-07T06:18:06.8093644Z  ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.9/581.2 MB 205.5 MB/s eta 0:00:03 2025-09-07T06:18:07.0112225Z  ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.7/581.2 MB 387.7 MB/s eta 0:00:02 2025-09-07T06:18:07.2130304Z  ━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━ 272.4/581.2 MB 470.0 MB/s eta 0:00:01 2025-09-07T06:18:07.4145018Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━ 389.0/581.2 MB 578.3 MB/s eta 0:00:01 2025-09-07T06:18:07.6165821Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━ 505.9/581.2 MB 579.1 MB/s eta 0:00:01 2025-09-07T06:18:07.8182053Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:08.0208068Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:08.2231351Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:08.4246151Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:08.6265456Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:08.8281972Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:09.0296260Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:09.2315062Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:09.4330016Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:09.6351288Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:09.8373529Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:10.0391383Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:10.2409040Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:10.4431828Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:10.6451713Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:10.8466162Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:11.0482607Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:11.2495273Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:11.2833944Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 581.2/581.2 MB 573.9 MB/s eta 0:00:01 2025-09-07T06:18:11.2834809Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 581.2/581.2 MB 62.5 MB/s 0:00:04 2025-09-07T06:18:11.2888279Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl (10.8 MB) 2025-09-07T06:18:11.3326806Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/10.8 MB ? eta -:--:-- 2025-09-07T06:18:11.3327661Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.8/10.8 MB 259.4 MB/s 0:00:00 2025-09-07T06:18:11.3374641Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (89.6 MB) 2025-09-07T06:18:11.5409424Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/89.6 MB ? eta -:--:-- 2025-09-07T06:18:11.7423448Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 89.4/89.6 MB 572.0 MB/s eta 0:00:01 2025-09-07T06:18:11.9436284Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 89.4/89.6 MB 572.0 MB/s eta 0:00:01 2025-09-07T06:18:11.9981763Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 89.4/89.6 MB 572.0 MB/s eta 0:00:01 2025-09-07T06:18:11.9982656Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.6/89.6 MB 135.8 MB/s 0:00:00 2025-09-07T06:18:12.0037432Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.5 MB) 2025-09-07T06:18:12.0220280Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/3.5 MB ? eta -:--:-- 2025-09-07T06:18:12.0221174Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 211.7 MB/s 0:00:00 2025-09-07T06:18:12.0258806Z [?25hDownloading https://download.pytorch.org/whl/nightly/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl (706.8 MB) 2025-09-07T06:18:12.2294518Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/706.8 MB ? eta -:--:-- 2025-09-07T06:18:12.4313408Z  ━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 113.2/706.8 MB 566.6 MB/s eta 0:00:02 2025-09-07T06:18:12.6333160Z  ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 230.2/706.8 MB 573.4 MB/s eta 0:00:01 2025-09-07T06:18:12.8352012Z  ━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 347.3/706.8 MB 580.4 MB/s eta 0:00:01 2025-09-07T06:18:13.0370779Z  ━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━ 464.5/706.8 MB 580.6 MB/s eta 0:00:01 2025-09-07T06:18:13.2388085Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━ 582.2/706.8 MB 581.7 MB/s eta 0:00:01 2025-09-07T06:18:13.4407333Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 699.1/706.8 MB 581.9 MB/s eta 0:00:01 2025-09-07T06:18:13.6423550Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:13.8444699Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:14.0460029Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:14.2479030Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:14.4494398Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:14.6513143Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:14.8530132Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:15.0551994Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:15.2571888Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:15.4594045Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:15.6612437Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:15.8631008Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:16.0651761Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:16.2674604Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:16.4690088Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:16.6704551Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:16.8723364Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:17.0737922Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:17.2753206Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:17.4773489Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:17.6794872Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:17.8812086Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:17.9970199Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 580.1 MB/s eta 0:00:01 2025-09-07T06:18:17.9971110Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 706.8/706.8 MB 50.4 MB/s 0:00:05 2025-09-07T06:18:18.0029235Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (200.9 MB) 2025-09-07T06:18:18.2058080Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/200.9 MB ? eta -:--:-- 2025-09-07T06:18:18.4079501Z  ━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━ 112.7/200.9 MB 564.0 MB/s eta 0:00:01 2025-09-07T06:18:18.6101613Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 200.8/200.9 MB 565.3 MB/s eta 0:00:01 2025-09-07T06:18:18.8120447Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 200.8/200.9 MB 565.3 MB/s eta 0:00:01 2025-09-07T06:18:19.0135805Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 200.8/200.9 MB 565.3 MB/s eta 0:00:01 2025-09-07T06:18:19.2152479Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 200.8/200.9 MB 565.3 MB/s eta 0:00:01 2025-09-07T06:18:19.4169885Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 200.8/200.9 MB 565.3 MB/s eta 0:00:01 2025-09-07T06:18:19.4832043Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 200.8/200.9 MB 565.3 MB/s eta 0:00:01 2025-09-07T06:18:19.4832925Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.9/200.9 MB 135.9 MB/s 0:00:01 2025-09-07T06:18:19.4881742Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB) 2025-09-07T06:18:19.4992120Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.2 MB ? eta -:--:-- 2025-09-07T06:18:19.4992931Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 122.8 MB/s 0:00:00 2025-09-07T06:18:19.5044542Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl (68.3 MB) 2025-09-07T06:18:19.7076272Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/68.3 MB ? eta -:--:-- 2025-09-07T06:18:19.9092903Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 68.2/68.3 MB 344.3 MB/s eta 0:00:01 2025-09-07T06:18:19.9972894Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 68.2/68.3 MB 344.3 MB/s eta 0:00:01 2025-09-07T06:18:19.9973843Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.3/68.3 MB 139.0 MB/s 0:00:00 2025-09-07T06:18:20.0029154Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl (338.1 MB) 2025-09-07T06:18:20.2070679Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/338.1 MB ? eta -:--:-- 2025-09-07T06:18:20.4087707Z  ━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.8/338.1 MB 193.4 MB/s eta 0:00:02 2025-09-07T06:18:20.6101665Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━ 152.6/338.1 MB 379.2 MB/s eta 0:00:01 2025-09-07T06:18:20.8123440Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 266.9/338.1 MB 452.4 MB/s eta 0:00:01 2025-09-07T06:18:21.0146148Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 337.9/338.1 MB 567.5 MB/s eta 0:00:01 2025-09-07T06:18:21.2166155Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 337.9/338.1 MB 567.5 MB/s eta 0:00:01 2025-09-07T06:18:21.4177853Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 337.9/338.1 MB 567.5 MB/s eta 0:00:01 2025-09-07T06:18:21.6200543Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 337.9/338.1 MB 567.5 MB/s eta 0:00:01 2025-09-07T06:18:21.8216141Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 337.9/338.1 MB 567.5 MB/s eta 0:00:01 2025-09-07T06:18:22.0238932Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 337.9/338.1 MB 567.5 MB/s eta 0:00:01 2025-09-07T06:18:22.2252687Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 337.9/338.1 MB 567.5 MB/s eta 0:00:01 2025-09-07T06:18:22.4271357Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 337.9/338.1 MB 567.5 MB/s eta 0:00:01 2025-09-07T06:18:22.5968732Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 337.9/338.1 MB 567.5 MB/s eta 0:00:01 2025-09-07T06:18:22.5969620Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 338.1/338.1 MB 112.8 MB/s 0:00:02 2025-09-07T06:18:22.6020632Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (366.5 MB) 2025-09-07T06:18:22.8052341Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/366.5 MB ? eta -:--:-- 2025-09-07T06:18:23.0071147Z  ━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.6/366.5 MB 494.1 MB/s eta 0:00:01 2025-09-07T06:18:23.2087336Z  ━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━ 191.1/366.5 MB 476.3 MB/s eta 0:00:01 2025-09-07T06:18:23.4100941Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━ 292.0/366.5 MB 481.4 MB/s eta 0:00:01 2025-09-07T06:18:23.6125054Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 366.2/366.5 MB 465.0 MB/s eta 0:00:01 2025-09-07T06:18:23.8138553Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 366.2/366.5 MB 465.0 MB/s eta 0:00:01 2025-09-07T06:18:24.0157570Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 366.2/366.5 MB 465.0 MB/s eta 0:00:01 2025-09-07T06:18:24.2175352Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 366.2/366.5 MB 465.0 MB/s eta 0:00:01 2025-09-07T06:18:24.4194216Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 366.2/366.5 MB 465.0 MB/s eta 0:00:01 2025-09-07T06:18:24.6210080Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 366.2/366.5 MB 465.0 MB/s eta 0:00:01 2025-09-07T06:18:24.8226848Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 366.2/366.5 MB 465.0 MB/s eta 0:00:01 2025-09-07T06:18:25.0241877Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 366.2/366.5 MB 465.0 MB/s eta 0:00:01 2025-09-07T06:18:25.2258116Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 366.2/366.5 MB 465.0 MB/s eta 0:00:01 2025-09-07T06:18:25.3937505Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 366.2/366.5 MB 465.0 MB/s eta 0:00:01 2025-09-07T06:18:25.3938381Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 366.5/366.5 MB 101.6 MB/s 0:00:02 2025-09-07T06:18:25.3995519Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl (287.2 MB) 2025-09-07T06:18:25.6020993Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/287.2 MB ? eta -:--:-- 2025-09-07T06:18:25.8041490Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.1/287.2 MB 200.0 MB/s eta 0:00:02 2025-09-07T06:18:26.0055214Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━ 152.0/287.2 MB 379.1 MB/s eta 0:00:01 2025-09-07T06:18:26.2077442Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 266.9/287.2 MB 460.4 MB/s eta 0:00:01 2025-09-07T06:18:26.4094210Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 507.2 MB/s eta 0:00:01 2025-09-07T06:18:26.6112984Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 507.2 MB/s eta 0:00:01 2025-09-07T06:18:26.8132413Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 507.2 MB/s eta 0:00:01 2025-09-07T06:18:27.0150529Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 507.2 MB/s eta 0:00:01 2025-09-07T06:18:27.2170217Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 507.2 MB/s eta 0:00:01 2025-09-07T06:18:27.4191526Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 507.2 MB/s eta 0:00:01 2025-09-07T06:18:27.6174990Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 507.2 MB/s eta 0:00:01 2025-09-07T06:18:27.6176261Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 287.2/287.2 MB 125.2 MB/s 0:00:02 2025-09-07T06:18:27.6240309Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.3 MB) 2025-09-07T06:18:27.8264090Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/322.3 MB ? eta -:--:-- 2025-09-07T06:18:28.0283681Z  ━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.5/322.3 MB 218.3 MB/s eta 0:00:02 2025-09-07T06:18:28.2297673Z  ━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━ 154.1/322.3 MB 384.2 MB/s eta 0:00:01 2025-09-07T06:18:28.4320817Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━ 266.1/322.3 MB 450.5 MB/s eta 0:00:01 2025-09-07T06:18:28.6344415Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 553.4 MB/s eta 0:00:01 2025-09-07T06:18:28.8364362Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 553.4 MB/s eta 0:00:01 2025-09-07T06:18:29.0385499Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 553.4 MB/s eta 0:00:01 2025-09-07T06:18:29.2401360Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 553.4 MB/s eta 0:00:01 2025-09-07T06:18:29.4415032Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 553.4 MB/s eta 0:00:01 2025-09-07T06:18:29.6434178Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 553.4 MB/s eta 0:00:01 2025-09-07T06:18:29.8452414Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 553.4 MB/s eta 0:00:01 2025-09-07T06:18:30.0467856Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 553.4 MB/s eta 0:00:01 2025-09-07T06:18:30.0773816Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 553.4 MB/s eta 0:00:01 2025-09-07T06:18:30.0821044Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 322.3/322.3 MB 118.2 MB/s 0:00:02 2025-09-07T06:18:30.0822411Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB) 2025-09-07T06:18:30.2856512Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/39.7 MB ? eta -:--:-- 2025-09-07T06:18:30.3524860Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 39.6/39.7 MB 519.8 MB/s eta 0:00:01 2025-09-07T06:18:30.3525710Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.7/39.7 MB 148.2 MB/s 0:00:00 2025-09-07T06:18:30.3568297Z [?25hDownloading https://download.pytorch.org/whl/nightly/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (124.7 MB) 2025-09-07T06:18:30.5599614Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/124.7 MB ? eta -:--:-- 2025-09-07T06:18:30.7617986Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━ 108.3/124.7 MB 542.0 MB/s eta 0:00:01 2025-09-07T06:18:30.9641481Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 124.5/124.7 MB 542.7 MB/s eta 0:00:01 2025-09-07T06:18:31.1660536Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 124.5/124.7 MB 542.7 MB/s eta 0:00:01 2025-09-07T06:18:31.3098071Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 124.5/124.7 MB 542.7 MB/s eta 0:00:01 2025-09-07T06:18:31.3099028Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.7/124.7 MB 130.9 MB/s 0:00:00 2025-09-07T06:18:31.3141711Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (85 kB) 2025-09-07T06:18:31.3910108Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.4.0%2Bgitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (155.6 MB) 2025-09-07T06:18:31.5936082Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/155.6 MB ? eta -:--:-- 2025-09-07T06:18:31.7952332Z  ━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.9/155.6 MB 133.6 MB/s eta 0:00:01 2025-09-07T06:18:31.9968722Z  ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.3/155.6 MB 107.4 MB/s eta 0:00:02 2025-09-07T06:18:32.1979815Z  ━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━ 65.0/155.6 MB 108.1 MB/s eta 0:00:01 2025-09-07T06:18:32.4003970Z  ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━ 83.6/155.6 MB 104.3 MB/s eta 0:00:01 2025-09-07T06:18:32.6018266Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 107.0/155.6 MB 106.2 MB/s eta 0:00:01 2025-09-07T06:18:32.8038242Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━ 125.6/155.6 MB 105.2 MB/s eta 0:00:01 2025-09-07T06:18:33.0053179Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━ 142.3/155.6 MB 104.6 MB/s eta 0:00:01 2025-09-07T06:18:33.1798252Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 155.5/155.6 MB 102.5 MB/s eta 0:00:01 2025-09-07T06:18:33.1799412Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.6/155.6 MB 87.0 MB/s 0:00:01 2025-09-07T06:18:33.2754788Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/torchvision-0.24.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl (9.5 MB) 2025-09-07T06:18:33.4452561Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/9.5 MB ? eta -:--:-- 2025-09-07T06:18:33.4453393Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.5/9.5 MB 55.3 MB/s 0:00:00 2025-09-07T06:18:33.5629257Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/torch-2.9.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl (1254.1 MB) 2025-09-07T06:18:33.7661158Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.3 GB ? eta -:--:-- 2025-09-07T06:18:33.9675577Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.3 GB 46.9 MB/s eta 0:00:27 2025-09-07T06:18:34.1693727Z  ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.3 GB 46.0 MB/s eta 0:00:27 2025-09-07T06:18:34.3711519Z  ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.3 GB 61.3 MB/s eta 0:00:20 2025-09-07T06:18:34.5727902Z  ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.3 GB 55.4 MB/s eta 0:00:22 2025-09-07T06:18:34.7738457Z  ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.3 GB 51.5 MB/s eta 0:00:24 2025-09-07T06:18:34.9753327Z  ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.3 GB 46.1 MB/s eta 0:00:27 2025-09-07T06:18:35.1770431Z  ━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.3 GB 46.4 MB/s eta 0:00:26 2025-09-07T06:18:35.3787613Z  ━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.3 GB 46.1 MB/s eta 0:00:26 2025-09-07T06:18:35.5802849Z  ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.3 GB 46.8 MB/s eta 0:00:25 2025-09-07T06:18:35.7815506Z  ━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.3 GB 49.9 MB/s eta 0:00:24 2025-09-07T06:18:35.9835197Z  ━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.3 GB 51.2 MB/s eta 0:00:23 2025-09-07T06:18:36.1848107Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.3 GB 54.2 MB/s eta 0:00:21 2025-09-07T06:18:36.3864053Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.3 GB 53.7 MB/s eta 0:00:21 2025-09-07T06:18:36.5882195Z  ━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.2/1.3 GB 54.0 MB/s eta 0:00:21 2025-09-07T06:18:36.7894903Z  ━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.2/1.3 GB 53.6 MB/s eta 0:00:21 2025-09-07T06:18:36.9913280Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.2/1.3 GB 54.2 MB/s eta 0:00:20 2025-09-07T06:18:37.1931113Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.2/1.3 GB 54.1 MB/s eta 0:00:20 2025-09-07T06:18:37.3950876Z  ━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.2/1.3 GB 53.6 MB/s eta 0:00:20 2025-09-07T06:18:37.5971106Z  ━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.2/1.3 GB 55.2 MB/s eta 0:00:19 2025-09-07T06:18:37.7990843Z  ━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.2/1.3 GB 55.1 MB/s eta 0:00:19 2025-09-07T06:18:38.0006077Z  ━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.2/1.3 GB 55.8 MB/s eta 0:00:19 2025-09-07T06:18:38.2035338Z  ━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/1.3 GB 57.2 MB/s eta 0:00:18 2025-09-07T06:18:38.4053529Z  ━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/1.3 GB 56.5 MB/s eta 0:00:18 2025-09-07T06:18:38.6071035Z  ━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/1.3 GB 54.7 MB/s eta 0:00:19 2025-09-07T06:18:38.8089837Z  ━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/1.3 GB 55.9 MB/s eta 0:00:18 2025-09-07T06:18:39.0113269Z  ━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/1.3 GB 54.7 MB/s eta 0:00:18 2025-09-07T06:18:39.2131248Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/1.3 GB 55.3 MB/s eta 0:00:18 2025-09-07T06:18:39.4148432Z  ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/1.3 GB 59.8 MB/s eta 0:00:16 2025-09-07T06:18:39.6163867Z  ━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/1.3 GB 60.9 MB/s eta 0:00:16 2025-09-07T06:18:39.8178579Z  ━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/1.3 GB 61.7 MB/s eta 0:00:15 2025-09-07T06:18:40.0197335Z  ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.3 GB 60.8 MB/s eta 0:00:15 2025-09-07T06:18:40.2215945Z  ━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.3 GB 60.4 MB/s eta 0:00:15 2025-09-07T06:18:40.4235306Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.3 GB 59.7 MB/s eta 0:00:15 2025-09-07T06:18:40.6252906Z  ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.3 GB 59.8 MB/s eta 0:00:15 2025-09-07T06:18:40.8268771Z  ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.3 GB 59.1 MB/s eta 0:00:15 2025-09-07T06:18:41.0285778Z  ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.3 GB 60.4 MB/s eta 0:00:14 2025-09-07T06:18:41.2299880Z  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.3 GB 60.8 MB/s eta 0:00:14 2025-09-07T06:18:41.4318003Z  ━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.3 GB 61.5 MB/s eta 0:00:14 2025-09-07T06:18:41.6335654Z  ━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━ 0.5/1.3 GB 62.9 MB/s eta 0:00:13 2025-09-07T06:18:41.8352725Z  ━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━ 0.5/1.3 GB 61.6 MB/s eta 0:00:13 2025-09-07T06:18:42.0373979Z  ━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━ 0.5/1.3 GB 62.8 MB/s eta 0:00:13 2025-09-07T06:18:42.2393404Z  ━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━ 0.5/1.3 GB 62.8 MB/s eta 0:00:12 2025-09-07T06:18:42.4405636Z  ━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━ 0.5/1.3 GB 61.9 MB/s eta 0:00:13 2025-09-07T06:18:42.6422070Z  ━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━ 0.5/1.3 GB 59.8 MB/s eta 0:00:13 2025-09-07T06:18:42.8441870Z  ━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━ 0.5/1.3 GB 63.6 MB/s eta 0:00:12 2025-09-07T06:18:43.0453072Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 0.6/1.3 GB 66.2 MB/s eta 0:00:11 2025-09-07T06:18:43.2472172Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 0.6/1.3 GB 65.6 MB/s eta 0:00:11 2025-09-07T06:18:43.4486660Z  ━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━ 0.6/1.3 GB 64.2 MB/s eta 0:00:11 2025-09-07T06:18:43.6503604Z  ━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━ 0.6/1.3 GB 63.8 MB/s eta 0:00:11 2025-09-07T06:18:43.8524859Z  ━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━ 0.6/1.3 GB 63.0 MB/s eta 0:00:11 2025-09-07T06:18:44.0538053Z  ━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━ 0.6/1.3 GB 65.2 MB/s eta 0:00:10 2025-09-07T06:18:44.2559061Z  ━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━ 0.6/1.3 GB 63.2 MB/s eta 0:00:11 2025-09-07T06:18:44.4573922Z  ━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 0.6/1.3 GB 62.6 MB/s eta 0:00:11 2025-09-07T06:18:44.6590244Z  ━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 0.6/1.3 GB 61.6 MB/s eta 0:00:11 2025-09-07T06:18:44.8604452Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 0.6/1.3 GB 61.2 MB/s eta 0:00:10 2025-09-07T06:18:45.0620272Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 0.7/1.3 GB 60.7 MB/s eta 0:00:10 2025-09-07T06:18:45.2642060Z  ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━ 0.7/1.3 GB 60.0 MB/s eta 0:00:10 2025-09-07T06:18:45.4657897Z  ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━ 0.7/1.3 GB 58.8 MB/s eta 0:00:10 2025-09-07T06:18:45.6676496Z  ━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━ 0.7/1.3 GB 57.2 MB/s eta 0:00:11 2025-09-07T06:18:45.8694539Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 0.7/1.3 GB 57.6 MB/s eta 0:00:10 2025-09-07T06:18:46.0712678Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 0.7/1.3 GB 56.1 MB/s eta 0:00:10 2025-09-07T06:18:46.2731378Z  ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━ 0.7/1.3 GB 55.6 MB/s eta 0:00:10 2025-09-07T06:18:46.4743640Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 0.7/1.3 GB 54.9 MB/s eta 0:00:10 2025-09-07T06:18:46.6760090Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 0.7/1.3 GB 54.6 MB/s eta 0:00:10 2025-09-07T06:18:46.8785509Z  ━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━ 0.7/1.3 GB 54.5 MB/s eta 0:00:10 2025-09-07T06:18:47.0803798Z  ━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━ 0.8/1.3 GB 54.5 MB/s eta 0:00:10 2025-09-07T06:18:47.2818618Z  ━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━ 0.8/1.3 GB 53.1 MB/s eta 0:00:10 2025-09-07T06:18:47.4836580Z  ━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━ 0.8/1.3 GB 55.0 MB/s eta 0:00:09 2025-09-07T06:18:47.6852619Z  ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 0.8/1.3 GB 53.7 MB/s eta 0:00:09 2025-09-07T06:18:47.8869853Z  ━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━ 0.8/1.3 GB 53.0 MB/s eta 0:00:09 2025-09-07T06:18:48.0889700Z  ━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━ 0.8/1.3 GB 51.7 MB/s eta 0:00:09 2025-09-07T06:18:48.2905136Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━ 0.8/1.3 GB 51.2 MB/s eta 0:00:09 2025-09-07T06:18:48.4922688Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━ 0.8/1.3 GB 50.7 MB/s eta 0:00:09 2025-09-07T06:18:48.6938239Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━ 0.8/1.3 GB 51.1 MB/s eta 0:00:09 2025-09-07T06:18:48.8961989Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━ 0.9/1.3 GB 51.3 MB/s eta 0:00:08 2025-09-07T06:18:49.0975940Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 0.9/1.3 GB 50.3 MB/s eta 0:00:08 2025-09-07T06:18:49.2994152Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 0.9/1.3 GB 49.8 MB/s eta 0:00:08 2025-09-07T06:18:49.5011339Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━ 0.9/1.3 GB 51.5 MB/s eta 0:00:08 2025-09-07T06:18:49.7032080Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 0.9/1.3 GB 52.1 MB/s eta 0:00:07 2025-09-07T06:18:49.9049199Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━ 0.9/1.3 GB 52.3 MB/s eta 0:00:07 2025-09-07T06:18:50.1064989Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━ 0.9/1.3 GB 52.2 MB/s eta 0:00:07 2025-09-07T06:18:50.3080034Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━ 0.9/1.3 GB 52.9 MB/s eta 0:00:07 2025-09-07T06:18:50.5098128Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 0.9/1.3 GB 55.0 MB/s eta 0:00:06 2025-09-07T06:18:50.7118788Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 1.0/1.3 GB 54.0 MB/s eta 0:00:06 2025-09-07T06:18:50.9133838Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━ 1.0/1.3 GB 53.4 MB/s eta 0:00:06 2025-09-07T06:18:51.1153471Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━ 1.0/1.3 GB 53.8 MB/s eta 0:00:06 2025-09-07T06:18:51.3168682Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━ 1.0/1.3 GB 53.0 MB/s eta 0:00:06 2025-09-07T06:18:51.5182807Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 1.0/1.3 GB 54.6 MB/s eta 0:00:05 2025-09-07T06:18:51.7203543Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━ 1.0/1.3 GB 54.3 MB/s eta 0:00:05 2025-09-07T06:18:51.9216501Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━ 1.0/1.3 GB 53.8 MB/s eta 0:00:05 2025-09-07T06:18:52.1236354Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━ 1.0/1.3 GB 55.8 MB/s eta 0:00:04 2025-09-07T06:18:52.3252288Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━ 1.0/1.3 GB 55.7 MB/s eta 0:00:04 2025-09-07T06:18:52.5266945Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━ 1.1/1.3 GB 56.5 MB/s eta 0:00:04 2025-09-07T06:18:52.7286159Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 1.1/1.3 GB 57.9 MB/s eta 0:00:04 2025-09-07T06:18:52.9298244Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━ 1.1/1.3 GB 57.9 MB/s eta 0:00:03 2025-09-07T06:18:53.1317792Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━ 1.1/1.3 GB 56.3 MB/s eta 0:00:03 2025-09-07T06:18:53.3332053Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━ 1.1/1.3 GB 56.6 MB/s eta 0:00:03 2025-09-07T06:18:53.5348232Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 1.1/1.3 GB 55.8 MB/s eta 0:00:03 2025-09-07T06:18:53.7367216Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━ 1.1/1.3 GB 57.8 MB/s eta 0:00:03 2025-09-07T06:18:53.9381528Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━ 1.1/1.3 GB 58.2 MB/s eta 0:00:02 2025-09-07T06:18:54.1403626Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━ 1.1/1.3 GB 56.7 MB/s eta 0:00:02 2025-09-07T06:18:54.3415798Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 1.2/1.3 GB 57.3 MB/s eta 0:00:02 2025-09-07T06:18:54.5435733Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 1.2/1.3 GB 57.1 MB/s eta 0:00:02 2025-09-07T06:18:54.7452002Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 1.2/1.3 GB 58.3 MB/s eta 0:00:02 2025-09-07T06:18:54.9469946Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━ 1.2/1.3 GB 58.6 MB/s eta 0:00:01 2025-09-07T06:18:55.1488055Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━ 1.2/1.3 GB 59.3 MB/s eta 0:00:01 2025-09-07T06:18:55.3499772Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 1.2/1.3 GB 58.4 MB/s eta 0:00:01 2025-09-07T06:18:55.5518347Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.2/1.3 GB 60.5 MB/s eta 0:00:01 2025-09-07T06:18:55.7535115Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 61.5 MB/s eta 0:00:01 2025-09-07T06:18:55.9551459Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:56.1569008Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:56.3589918Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:56.5604556Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:56.7623346Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:56.9642252Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:57.1657442Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:57.3675635Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:57.5693552Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:57.7711148Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:57.9727358Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:58.1745660Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:58.3755353Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:58.5775877Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:58.7792487Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:58.9805576Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:59.1826998Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:59.3842245Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:59.5853080Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:59.7875984Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:18:59.9894033Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:00.1911396Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:00.3926162Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:00.5945411Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:00.7965236Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:00.9974806Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:01.1993033Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:01.4014207Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:01.6028285Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:01.8043297Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:02.0054009Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:02.2076325Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:02.4094773Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:02.6107347Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:02.8124066Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:03.0135520Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:03.2152964Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:03.4173874Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:03.6191478Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:03.8207622Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:04.0226976Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:04.2242982Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:04.4266676Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:04.6286070Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:04.8306492Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:05.0320707Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:05.2334024Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:05.4352229Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:05.6371766Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:05.8392476Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:06.0405799Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:06.2421704Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:06.4440842Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:06.6457993Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:06.8475905Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:07.0494152Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:07.2516002Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:07.4534354Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:07.6552425Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:07.8567861Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:08.0586327Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:08.2605066Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:08.4620368Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:08.6637583Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:08.8652838Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:09.0675050Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:09.2693484Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:09.4710146Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:09.6725872Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:09.8694471Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 1.3/1.3 GB 59.8 MB/s eta 0:00:01 2025-09-07T06:19:09.8695326Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 GB 14.1 MB/s 0:00:36 2025-09-07T06:19:09.8747521Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu129/torchaudio-2.8.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl (2.3 MB) 2025-09-07T06:19:10.0775945Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/2.3 MB ? eta -:--:-- 2025-09-07T06:19:10.1856253Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━ 2.1/2.3 MB 10.7 MB/s eta 0:00:01 2025-09-07T06:19:10.1857245Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 6.8 MB/s 0:00:00 2025-09-07T06:19:10.1905668Z [?25hDownloading https://download.pytorch.org/whl/nightly/fsspec-2025.7.0-py3-none-any.whl (199 kB) 2025-09-07T06:19:10.2975067Z Downloading https://download.pytorch.org/whl/nightly/networkx-3.5-py3-none-any.whl (2.0 MB) 2025-09-07T06:19:10.4174835Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/2.0 MB ? eta -:--:-- 2025-09-07T06:19:10.4175707Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 15.2 MB/s 0:00:00 2025-09-07T06:19:10.4217966Z [?25hDownloading https://download.pytorch.org/whl/nightly/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB) 2025-09-07T06:19:10.6251616Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/6.6 MB ? eta -:--:-- 2025-09-07T06:19:10.6440972Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 6.6/6.6 MB 474.3 MB/s eta 0:00:01 2025-09-07T06:19:10.6441826Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 29.2 MB/s 0:00:00 2025-09-07T06:19:10.6490535Z [?25hDownloading https://download.pytorch.org/whl/nightly/sympy-1.14.0-py3-none-any.whl (6.3 MB) 2025-09-07T06:19:10.8520293Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/6.3 MB ? eta -:--:-- 2025-09-07T06:19:10.8659681Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 6.3/6.3 MB 482.9 MB/s eta 0:00:01 2025-09-07T06:19:10.8660857Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 28.2 MB/s 0:00:00 2025-09-07T06:19:10.8703515Z [?25hDownloading https://download.pytorch.org/whl/nightly/typing_extensions-4.14.1-py3-none-any.whl (43 kB) 2025-09-07T06:19:10.9770035Z Downloading https://download.pytorch.org/whl/nightly/filelock-3.19.1-py3-none-any.whl (15 kB) 2025-09-07T06:19:11.0852562Z Downloading https://download.pytorch.org/whl/nightly/jinja2-3.1.6-py3-none-any.whl (134 kB) 2025-09-07T06:19:11.2140800Z Downloading https://download.pytorch.org/whl/nightly/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB) 2025-09-07T06:19:11.3412945Z Downloading https://download.pytorch.org/whl/nightly/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB) 2025-09-07T06:19:11.5446494Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/16.6 MB ? eta -:--:-- 2025-09-07T06:19:11.6654635Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 16.5/16.6 MB 359.9 MB/s eta 0:00:01 2025-09-07T06:19:11.6655499Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.6/16.6 MB 51.0 MB/s 0:00:00 2025-09-07T06:19:23.6437483Z [?25hInstalling collected packages: nvidia-cusparselt-cu12, mpmath, typing-extensions, sympy, pytorch-triton, pillow, nvidia-nvtx-cu12, nvidia-nvshmem-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio 2025-09-07T06:19:23.8116080Z [?25l 2025-09-07T06:19:23.9793511Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:24.1469356Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:24.3143026Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:24.4820226Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:24.6496874Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:24.8174084Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:24.9851161Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:25.1528002Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:25.3205279Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:25.4879921Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:25.6554091Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:25.8232446Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:25.9909724Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:26.1585201Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:26.3292442Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:19:26.5001252Z  ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  1/29 [mpmath] 2025-09-07T06:19:26.6676281Z  ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  1/29 [mpmath] 2025-09-07T06:19:26.8352263Z  ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  1/29 [mpmath] 2025-09-07T06:19:27.0029678Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:27.1992908Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:27.3670378Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:27.5390957Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:27.7220127Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:27.8977551Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:28.0654327Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:28.2433640Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:28.4119060Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:28.5794488Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:28.7490280Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:28.9164437Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:29.0849058Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:29.2772495Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:29.4449487Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:29.6294625Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:29.8123266Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:29.9809840Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:30.2923544Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:30.4618944Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:30.7142565Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:30.8864798Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:31.0585138Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:31.2336387Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:31.4033870Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:31.5710918Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:31.7414323Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:31.9098141Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:32.0782203Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:32.2534006Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:32.4381933Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:32.6084207Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:32.7758626Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:32.9473518Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:33.1172750Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:33.2852053Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:33.4532996Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:33.6209636Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:19:33.7886113Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:33.9559430Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:34.1239220Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:34.2916436Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:34.4592923Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:34.6269615Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:34.7943134Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:34.9620832Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:35.1299039Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:35.2975956Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:35.4654977Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:35.6334490Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:35.8012468Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:35.9690830Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:36.1393909Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:36.3070121Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:19:36.4774732Z  ━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  5/29 [pillow] 2025-09-07T06:19:36.6452201Z  ━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  5/29 [pillow] 2025-09-07T06:19:36.8130252Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:19:36.9807665Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:19:37.1484122Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:19:37.3158287Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:19:37.4836835Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:19:37.6514080Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:19:37.8190970Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:19:37.9872980Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:19:38.1548104Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:19:38.3222053Z  ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━  8/29 [nvidia-nvjitlink-cu12] 2025-09-07T06:19:38.4899226Z  ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━  8/29 [nvidia-nvjitlink-cu12] 2025-09-07T06:19:38.6577444Z  ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━  8/29 [nvidia-nvjitlink-cu12] 2025-09-07T06:19:38.8253821Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:38.9932864Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:39.1608718Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:39.3284391Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:39.4958330Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:39.6638081Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:39.8315496Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:39.9992769Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:40.1670362Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:40.3344056Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:40.5022119Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:40.6699032Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:40.8375215Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:41.0052371Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:41.1730522Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:41.3405851Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:41.5082172Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:41.6755774Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:41.8435238Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:42.0112732Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:19:42.1789641Z  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 10/29 [nvidia-curand-cu12] 2025-09-07T06:19:42.3463873Z  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 10/29 [nvidia-curand-cu12] 2025-09-07T06:19:42.5141328Z  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 10/29 [nvidia-curand-cu12] 2025-09-07T06:19:42.6817934Z  ━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━ 11/29 [nvidia-cufile-cu12] 2025-09-07T06:19:42.8495623Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:19:43.0174571Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:19:43.1851819Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:19:43.3529518Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:19:43.5206267Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:19:43.6882999Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:19:43.8564542Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:19:44.0236698Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:19:44.1912818Z  ━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━ 14/29 [nvidia-cuda-cupti-cu12] 2025-09-07T06:19:44.3588596Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:44.5262944Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:44.6941237Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:44.8618564Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:45.0295691Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:45.1973978Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:45.3652581Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:45.5328401Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:45.7005365Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:45.8681606Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:46.0355057Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:46.2034927Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:46.3709963Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:46.5384402Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:46.7061355Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:46.8738474Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:47.0414986Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:47.2093762Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:47.3770509Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:47.5447571Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:47.7127060Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:47.8803172Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:48.0481297Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:48.2155639Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:48.3835002Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:48.5511255Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:48.7188895Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:48.8891476Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:49.1533459Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:49.3220993Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:49.4897632Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:49.6574431Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:49.8294231Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:50.0048553Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:50.1816653Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:50.3494640Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:50.5174413Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:50.6851554Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:50.8539209Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:51.0234403Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:51.1915438Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:51.3597356Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:51.5273171Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:51.6959041Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:51.8639924Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:52.0386951Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:52.2061705Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━ 19/29 [fsspec] 2025-09-07T06:19:52.3740150Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 20/29 [filelock] 2025-09-07T06:19:52.5416569Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:52.7094425Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:52.8773160Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:53.0449368Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:53.2124598Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:53.3800582Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:53.5476477Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:53.7151468Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:53.8825155Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:54.0500689Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:54.2177344Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:54.3854529Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:54.5533460Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:54.7211372Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:54.8886239Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:55.0559544Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:55.2238766Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:55.3914524Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:55.5590202Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:55.7269899Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:55.8944298Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:56.0621436Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:56.2299191Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:56.3976290Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:56.5653288Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:56.7333287Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:56.9010167Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:57.0685818Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:57.2359394Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:57.4039024Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:57.5715673Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:57.7391823Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:57.9070047Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:58.0742630Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:58.2419451Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:58.4096997Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:58.5774576Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:58.7451703Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:58.9129236Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:59.0805029Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:59.2481778Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:59.4154475Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:59.5833580Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:59.7511057Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:59.9185344Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:00.0861221Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:00.2538695Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:00.4216381Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:00.5894316Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:00.7572953Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:00.9250388Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:01.0928151Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:01.2605196Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:01.4280746Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:01.5952879Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:01.7632502Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:01.9308527Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:02.0982109Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:02.2659425Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:02.4336081Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:02.6013916Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:02.7691725Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:02.9368405Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:03.1045608Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:03.2721526Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:03.4398102Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:03.6074597Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:03.7750957Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:03.9423457Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:04.1101799Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:04.2779037Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:04.4456735Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:04.6134520Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:04.7813567Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:04.9491503Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:05.1167265Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:05.2843373Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:05.4518706Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:05.6213366Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:20:05.7890084Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━ 24/29 [jinja2] 2025-09-07T06:20:05.9568467Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:06.1244363Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:06.2920958Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:06.4598435Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:06.6276526Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:06.7951243Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:06.9626440Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:07.1303198Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:07.2980020Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:07.4657455Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:07.6334926Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:07.8013571Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:07.9690665Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:08.1368347Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:08.3044793Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:08.4720386Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:08.6396839Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:08.8072119Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:08.9749949Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:09.1422366Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:09.3100291Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:09.4784654Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:09.6512701Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:20:09.8238804Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:09.9915841Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:10.1592914Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:10.3270141Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:10.4945215Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:10.6621205Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:10.8301156Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:10.9977691Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:11.1654410Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:11.3333139Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:11.5010806Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:11.6686981Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:11.8363134Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:12.0043621Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:12.1720444Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:12.3398063Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:12.5075191Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:12.6751373Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:12.8425789Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:13.0102189Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:13.1779052Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:13.3457521Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:13.5134979Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:13.6814288Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:13.8493301Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:14.0172228Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:14.1847863Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:14.3523997Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:14.5200174Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:14.6876150Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:14.8551228Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:15.0230540Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:15.1903213Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:15.3580446Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:15.5258207Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:15.6935106Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:15.8613741Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:16.0290411Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:16.1968041Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:16.3644093Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:16.5319715Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:16.6997491Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:16.8674472Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:17.0351367Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:17.2023168Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:17.3700728Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:17.5378100Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:17.7054948Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:17.8733654Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:18.0413046Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:18.2089386Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:18.3762936Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:18.5443455Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:18.7118690Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:18.8794742Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:19.0470814Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:19.2142995Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:19.3820415Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:19.5497987Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:19.7174413Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:19.8852323Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:20.0531857Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:20.2208101Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:20.3885089Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:20.5559361Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:20.7239356Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:20.8915232Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:21.0591472Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:21.2266108Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:21.3941492Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:21.5618459Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:21.7295445Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:21.8974484Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:22.0652664Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:22.2331630Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:22.4008138Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:22.5684329Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:22.7358108Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:22.9037407Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:23.0713826Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:23.2390588Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:23.4066409Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:23.5746332Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:23.7421007Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:23.9098181Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:24.0774349Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:24.2452144Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:24.4129428Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:24.5806792Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:24.7483408Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:24.9157774Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:25.0838257Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:25.2514370Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:25.4192983Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:25.5869366Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:25.7543630Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:25.9220860Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:26.0899222Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:26.2576215Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:26.4259438Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:26.5964368Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:26.7694284Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:26.9401121Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:27.1139232Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:27.2840079Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:27.4534250Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:27.6302959Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:27.7979843Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:27.9831344Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:28.1522393Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:28.3208212Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:28.5227990Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:28.7093781Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:28.9052788Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:29.0731764Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:29.2442307Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:29.4152297Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:29.5835406Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:29.7552000Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:29.9299080Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:30.0975942Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:30.2664403Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:30.4339788Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:30.6020557Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:30.7698217Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:30.9378892Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:31.1055604Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:31.2859883Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:31.4548129Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:31.6277437Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:31.7952843Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:31.9631374Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:32.1759327Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:32.5086121Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:32.6866119Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:32.8542260Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:33.0259511Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:33.1947598Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:33.3622002Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:33.5313185Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:33.7005413Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:33.8682844Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:34.0355933Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:34.2035827Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:20:34.3737473Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 27/29 [torchvision] 2025-09-07T06:20:34.5419096Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 27/29 [torchvision] 2025-09-07T06:20:34.7120958Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 27/29 [torchvision] 2025-09-07T06:20:34.8800176Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 27/29 [torchvision] 2025-09-07T06:20:35.0476955Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 27/29 [torchvision] 2025-09-07T06:20:35.2152580Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━ 28/29 [torchaudio] 2025-09-07T06:20:35.3432693Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━ 28/29 [torchaudio] 2025-09-07T06:20:35.3433392Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 29/29 [torchaudio] 2025-09-07T06:20:35.3433836Z [?25h 2025-09-07T06:20:35.3476293Z Successfully installed MarkupSafe-3.0.2 filelock-3.19.1 fsspec-2025.7.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 numpy-2.3.2 nvidia-cublas-cu12-12.9.1.4 nvidia-cuda-cupti-cu12-12.9.79 nvidia-cuda-nvrtc-cu12-12.9.86 nvidia-cuda-runtime-cu12-12.9.79 nvidia-cudnn-cu12-9.10.2.21 nvidia-cufft-cu12-11.4.1.4 nvidia-cufile-cu12-1.14.1.1 nvidia-curand-cu12-10.3.10.19 nvidia-cusolver-cu12-11.7.5.82 nvidia-cusparse-cu12-12.5.10.65 nvidia-cusparselt-cu12-0.7.1 nvidia-nccl-cu12-2.27.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvshmem-cu12-3.3.20 nvidia-nvtx-cu12-12.9.79 pillow-11.3.0 pytorch-triton-3.4.0+gitf7888497 sympy-1.14.0 torch-2.9.0.dev20250901+cu129 torchaudio-2.8.0.dev20250901+cu129 torchvision-0.24.0.dev20250901+cu129 typing-extensions-4.14.1 2025-09-07T06:20:35.3481973Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-09-07T06:20:35.7227867Z + docker exec -t 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 /opt/python/cp312-cp312/bin/python -mpip download --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129 2025-09-07T06:20:36.2432836Z Looking in indexes: https://download.pytorch.org/whl/nightly/cu129 2025-09-07T06:20:36.4694401Z Collecting torch 2025-09-07T06:20:36.5313310Z Using cached https://download.pytorch.org/whl/nightly/cu129/torch-2.9.0.dev20250905%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB) 2025-09-07T06:20:36.7332702Z Collecting torchvision 2025-09-07T06:20:36.7854435Z Using cached https://download.pytorch.org/whl/nightly/cu129/torchvision-0.24.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (5.9 kB) 2025-09-07T06:20:36.9331278Z Collecting torchaudio 2025-09-07T06:20:37.0197345Z Using cached https://download.pytorch.org/whl/nightly/cu129/torchaudio-2.8.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (7.3 kB) 2025-09-07T06:20:37.0522230Z Collecting filelock (from torch) 2025-09-07T06:20:37.1094541Z Using cached https://download.pytorch.org/whl/nightly/filelock-3.19.1-py3-none-any.whl.metadata (2.1 kB) 2025-09-07T06:20:37.1371992Z Collecting typing-extensions>=4.10.0 (from torch) 2025-09-07T06:20:37.1998642Z Using cached https://download.pytorch.org/whl/nightly/typing_extensions-4.14.1-py3-none-any.whl.metadata (3.0 kB) 2025-09-07T06:20:37.2254021Z Collecting setuptools (from torch) 2025-09-07T06:20:37.2294573Z Downloading https://download.pytorch.org/whl/nightly/setuptools-78.1.0-py3-none-any.whl.metadata (6.6 kB) 2025-09-07T06:20:37.3811114Z Collecting sympy>=1.13.3 (from torch) 2025-09-07T06:20:37.4416884Z Using cached https://download.pytorch.org/whl/nightly/sympy-1.14.0-py3-none-any.whl.metadata (12 kB) 2025-09-07T06:20:37.4851927Z Collecting networkx>=2.5.1 (from torch) 2025-09-07T06:20:37.5813598Z Using cached https://download.pytorch.org/whl/nightly/networkx-3.5-py3-none-any.whl.metadata (6.3 kB) 2025-09-07T06:20:37.6455288Z Collecting jinja2 (from torch) 2025-09-07T06:20:37.6952575Z Using cached https://download.pytorch.org/whl/nightly/jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB) 2025-09-07T06:20:37.7232515Z Collecting fsspec>=0.8.5 (from torch) 2025-09-07T06:20:37.7883844Z Using cached https://download.pytorch.org/whl/nightly/fsspec-2025.7.0-py3-none-any.whl.metadata (12 kB) 2025-09-07T06:20:37.8517379Z Collecting nvidia-cuda-nvrtc-cu12==12.9.86 (from torch) 2025-09-07T06:20:37.9152659Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:20:37.9608739Z Collecting nvidia-cuda-runtime-cu12==12.9.79 (from torch) 2025-09-07T06:20:38.0266539Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:20:38.0733015Z Collecting nvidia-cuda-cupti-cu12==12.9.79 (from torch) 2025-09-07T06:20:38.1345706Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:20:38.1646696Z Collecting nvidia-cudnn-cu12==9.10.2.21 (from torch) 2025-09-07T06:20:38.2252532Z Using cached https://download.pytorch.org/whl/nightly/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:20:38.2630390Z Collecting nvidia-cublas-cu12==12.9.1.4 (from torch) 2025-09-07T06:20:38.3165837Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:20:38.3522713Z Collecting nvidia-cufft-cu12==11.4.1.4 (from torch) 2025-09-07T06:20:38.4061339Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:20:38.4715120Z Collecting nvidia-curand-cu12==10.3.10.19 (from torch) 2025-09-07T06:20:38.5326391Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:20:38.5574141Z Collecting nvidia-cusolver-cu12==11.7.5.82 (from torch) 2025-09-07T06:20:38.6076959Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl.metadata (1.9 kB) 2025-09-07T06:20:38.7194914Z Collecting nvidia-cusparse-cu12==12.5.10.65 (from torch) 2025-09-07T06:20:38.7721847Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:20:38.8011483Z Collecting nvidia-cusparselt-cu12==0.7.1 (from torch) 2025-09-07T06:20:38.8625715Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl.metadata (7.0 kB) 2025-09-07T06:20:38.8901370Z Collecting nvidia-nccl-cu12==2.27.5 (from torch) 2025-09-07T06:20:38.9526268Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) 2025-09-07T06:20:38.9951754Z Collecting nvidia-nvshmem-cu12==3.3.20 (from torch) 2025-09-07T06:20:39.0586888Z Using cached https://download.pytorch.org/whl/nightly/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.1 kB) 2025-09-07T06:20:39.0883184Z Collecting nvidia-nvtx-cu12==12.9.79 (from torch) 2025-09-07T06:20:39.1486102Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:20:39.1764059Z Collecting nvidia-nvjitlink-cu12==12.9.86 (from torch) 2025-09-07T06:20:39.2386982Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:20:39.2920663Z Collecting nvidia-cufile-cu12==1.14.1.1 (from torch) 2025-09-07T06:20:39.3463378Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:20:39.4286189Z Collecting pytorch-triton==3.4.0+gitf7888497 (from torch) 2025-09-07T06:20:39.4905914Z Using cached https://download.pytorch.org/whl/nightly/pytorch_triton-3.4.0%2Bgitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:20:39.5854984Z Collecting numpy (from torchvision) 2025-09-07T06:20:39.6374504Z Using cached https://download.pytorch.org/whl/nightly/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB) 2025-09-07T06:20:39.6534351Z Collecting torch 2025-09-07T06:20:39.7126625Z Using cached https://download.pytorch.org/whl/nightly/cu129/torch-2.9.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB) 2025-09-07T06:20:39.7665607Z Collecting pillow!=8.3.*,>=5.3.0 (from torchvision) 2025-09-07T06:20:39.8422322Z Using cached https://download.pytorch.org/whl/nightly/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (9.0 kB) 2025-09-07T06:20:39.8867833Z Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch) 2025-09-07T06:20:39.9353392Z Using cached https://download.pytorch.org/whl/nightly/mpmath-1.3.0-py3-none-any.whl (536 kB) 2025-09-07T06:20:39.9897532Z Collecting MarkupSafe>=2.0 (from jinja2->torch) 2025-09-07T06:20:40.0463421Z Using cached https://download.pytorch.org/whl/nightly/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB) 2025-09-07T06:20:40.1013807Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl (581.2 MB) 2025-09-07T06:20:40.4639660Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl (10.8 MB) 2025-09-07T06:20:40.5180593Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (89.6 MB) 2025-09-07T06:20:40.6256400Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.5 MB) 2025-09-07T06:20:40.6810787Z Using cached https://download.pytorch.org/whl/nightly/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl (706.8 MB) 2025-09-07T06:20:41.0979316Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (200.9 MB) 2025-09-07T06:20:41.2640821Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB) 2025-09-07T06:20:41.3193951Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl (68.3 MB) 2025-09-07T06:20:41.4095625Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl (338.1 MB) 2025-09-07T06:20:41.6439016Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (366.5 MB) 2025-09-07T06:20:41.8813754Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl (287.2 MB) 2025-09-07T06:20:42.0797863Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.3 MB) 2025-09-07T06:20:42.2834040Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.7 MB) 2025-09-07T06:20:42.3524747Z Using cached https://download.pytorch.org/whl/nightly/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (124.7 MB) 2025-09-07T06:20:42.4635800Z Using cached https://download.pytorch.org/whl/nightly/cu129/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (85 kB) 2025-09-07T06:20:42.5270344Z Using cached https://download.pytorch.org/whl/nightly/pytorch_triton-3.4.0%2Bgitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (155.6 MB) 2025-09-07T06:20:42.6517295Z Using cached https://download.pytorch.org/whl/nightly/cu129/torchvision-0.24.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl (9.5 MB) 2025-09-07T06:20:42.6723516Z Using cached https://download.pytorch.org/whl/nightly/cu129/torch-2.9.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl (1254.1 MB) 2025-09-07T06:20:43.5812326Z Using cached https://download.pytorch.org/whl/nightly/cu129/torchaudio-2.8.0.dev20250901%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl (2.3 MB) 2025-09-07T06:20:43.6166404Z Using cached https://download.pytorch.org/whl/nightly/fsspec-2025.7.0-py3-none-any.whl (199 kB) 2025-09-07T06:20:43.6552873Z Using cached https://download.pytorch.org/whl/nightly/networkx-3.5-py3-none-any.whl (2.0 MB) 2025-09-07T06:20:43.7094597Z Using cached https://download.pytorch.org/whl/nightly/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB) 2025-09-07T06:20:43.7192824Z Downloading https://download.pytorch.org/whl/nightly/setuptools-78.1.0-py3-none-any.whl (1.3 MB) 2025-09-07T06:20:43.8248964Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.3 MB ? eta -:--:-- 2025-09-07T06:20:43.8250079Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 9.8 MB/s 0:00:00 2025-09-07T06:20:43.8774545Z [?25hUsing cached https://download.pytorch.org/whl/nightly/sympy-1.14.0-py3-none-any.whl (6.3 MB) 2025-09-07T06:20:43.9256872Z Using cached https://download.pytorch.org/whl/nightly/typing_extensions-4.14.1-py3-none-any.whl (43 kB) 2025-09-07T06:20:43.9749718Z Using cached https://download.pytorch.org/whl/nightly/filelock-3.19.1-py3-none-any.whl (15 kB) 2025-09-07T06:20:44.0206455Z Using cached https://download.pytorch.org/whl/nightly/jinja2-3.1.6-py3-none-any.whl (134 kB) 2025-09-07T06:20:44.0841912Z Using cached https://download.pytorch.org/whl/nightly/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB) 2025-09-07T06:20:44.1380158Z Using cached https://download.pytorch.org/whl/nightly/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB) 2025-09-07T06:20:56.1165108Z Saved ./nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl 2025-09-07T06:20:56.1215317Z Saved ./nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl 2025-09-07T06:20:56.1620651Z Saved ./nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl 2025-09-07T06:20:56.1643799Z Saved ./nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:56.4738394Z Saved ./nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl 2025-09-07T06:20:56.5623417Z Saved ./nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:56.5632568Z Saved ./nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:56.5938943Z Saved ./nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl 2025-09-07T06:20:56.7420190Z Saved ./nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl 2025-09-07T06:20:56.9021797Z Saved ./nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:57.0278598Z Saved ./nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl 2025-09-07T06:20:57.1689943Z Saved ./nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:57.1870087Z Saved ./nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl 2025-09-07T06:20:57.2414663Z Saved ./nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:57.2417778Z Saved ./nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl 2025-09-07T06:20:57.3107432Z Saved ./pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T06:20:57.3152095Z Saved ./torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:20:57.8650190Z Saved ./torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:20:57.8661137Z Saved ./torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:20:57.8665485Z Saved ./fsspec-2025.7.0-py3-none-any.whl 2025-09-07T06:20:57.8683086Z Saved ./networkx-3.5-py3-none-any.whl 2025-09-07T06:20:57.8716817Z Saved ./pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T06:20:57.8722585Z Saved ./setuptools-78.1.0-py3-none-any.whl 2025-09-07T06:20:57.8754977Z Saved ./sympy-1.14.0-py3-none-any.whl 2025-09-07T06:20:57.8760580Z Saved ./mpmath-1.3.0-py3-none-any.whl 2025-09-07T06:20:57.8764285Z Saved ./typing_extensions-4.14.1-py3-none-any.whl 2025-09-07T06:20:57.8767441Z Saved ./filelock-3.19.1-py3-none-any.whl 2025-09-07T06:20:57.8771520Z Saved ./jinja2-3.1.6-py3-none-any.whl 2025-09-07T06:20:57.8775293Z Saved ./MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 2025-09-07T06:20:57.8853372Z Saved ./numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T06:20:57.8858761Z Successfully downloaded nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12 nvidia-cufft-cu12 nvidia-cufile-cu12 nvidia-curand-cu12 nvidia-cusolver-cu12 nvidia-cusparse-cu12 nvidia-cusparselt-cu12 nvidia-nccl-cu12 nvidia-nvjitlink-cu12 nvidia-nvshmem-cu12 nvidia-nvtx-cu12 pytorch-triton torchvision torch torchaudio fsspec networkx pillow setuptools sympy mpmath typing-extensions filelock jinja2 MarkupSafe numpy 2025-09-07T06:20:58.3201587Z + echo PYTHON_EXECUTABLE=/opt/python/cp312-cp312/bin/python 2025-09-07T06:20:58.3202306Z + echo container_name=80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 2025-09-07T06:20:58.3268012Z Prepare all required actions 2025-09-07T06:20:58.3323394Z ##[group]Run ./.github/actions/build-external-packages 2025-09-07T06:20:58.3323762Z with: 2025-09-07T06:20:58.3324011Z build-targets: vllm 2025-09-07T06:20:58.3324334Z docker-image: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:20:58.3324877Z cuda-arch-list: 8.0;8.9;9.0;10.0;12.0 2025-09-07T06:20:58.3325346Z torch-wheel-dir: /home/ec2-user/actions-runner/_work/_temp/artifacts 2025-09-07T06:20:58.3325948Z output-dir: /home/ec2-user/actions-runner/_work/_temp/artifacts/externals 2025-09-07T06:20:58.3326430Z cuda-version: 12.8.1 2025-09-07T06:20:58.3326687Z env: 2025-09-07T06:20:58.3326889Z PY_VERS: 3.12 2025-09-07T06:20:58.3327213Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:20:58.3327625Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:20:58.3327938Z BUILD_DEVICE: cu129 2025-09-07T06:20:58.3328261Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T06:20:58.3328907Z container_name: 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 2025-09-07T06:20:58.3329420Z ##[endgroup] 2025-09-07T06:20:58.3474982Z ##[group]Run set -euo pipefail 2025-09-07T06:20:58.3475343Z set -euo pipefail 2025-09-07T06:20:58.3475630Z python3 --version 2025-09-07T06:20:58.3475893Z docker images 2025-09-07T06:20:58.3476169Z START_TIME=$(date +%s) 2025-09-07T06:20:58.3476436Z ( 2025-09-07T06:20:58.3476667Z  cd .ci/lumen_cli 2025-09-07T06:20:58.3476950Z  python3 -m pip install -e . 2025-09-07T06:20:58.3477261Z ) 2025-09-07T06:20:58.3477500Z MAX_JOBS="$(nproc --ignore=6)" 2025-09-07T06:20:58.3477811Z export MAX_JOBS 2025-09-07T06:20:58.3478208Z  2025-09-07T06:20:58.3478504Z # Split the comma-separated list and build each target 2025-09-07T06:20:58.3478950Z IFS=',' read -ra TARGETS <<< "$BUILD_TARGETS" 2025-09-07T06:20:58.3479319Z for target in "${TARGETS[@]}"; do 2025-09-07T06:20:58.3479688Z  OUTPUT_DIR="$PARENT_OUTPUT_DIR/$target" 2025-09-07T06:20:58.3480035Z  export OUTPUT_DIR 2025-09-07T06:20:58.3480460Z  echo "Building external package: $target in directory $OUTPUT_DIR" 2025-09-07T06:20:58.3480970Z  python3 -m cli.run build external "$target" 2025-09-07T06:20:58.3481312Z done 2025-09-07T06:20:58.3481533Z  2025-09-07T06:20:58.3481740Z END_TIME=$(date +%s) 2025-09-07T06:20:58.3482013Z { 2025-09-07T06:20:58.3482272Z  echo "build_time=$((END_TIME - START_TIME))" 2025-09-07T06:20:58.3482660Z  if [ -d "$PARENT_OUTPUT_DIR" ]; then 2025-09-07T06:20:58.3483030Z  echo "output_dir=$PARENT_OUTPUT_DIR" 2025-09-07T06:20:58.3483352Z  fi 2025-09-07T06:20:58.3483586Z } >> "$GITHUB_OUTPUT" 2025-09-07T06:20:58.3494476Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:20:58.3494919Z env: 2025-09-07T06:20:58.3495141Z PY_VERS: 3.12 2025-09-07T06:20:58.3495491Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:20:58.3495922Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:20:58.3496250Z BUILD_DEVICE: cu129 2025-09-07T06:20:58.3496594Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T06:20:58.3497227Z container_name: 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 2025-09-07T06:20:58.3497858Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2025-09-07T06:20:58.3498260Z SCCACHE_REGION: us-east-1 2025-09-07T06:20:58.3498565Z CUDA_VERSION: 12.8.1 2025-09-07T06:20:58.3498855Z TORCH_CUDA_ARCH_LIST: 8.0;8.9;9.0;10.0;12.0 2025-09-07T06:20:58.3499276Z BASE_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:20:58.3499668Z BUILD_TARGETS: vllm 2025-09-07T06:20:58.3500136Z PARENT_OUTPUT_DIR: /home/ec2-user/actions-runner/_work/_temp/artifacts/externals 2025-09-07T06:20:58.3500958Z TORCH_WHEELS_PATH: /home/ec2-user/actions-runner/_work/_temp/artifacts 2025-09-07T06:20:58.3501448Z ##[endgroup] 2025-09-07T06:20:58.3548571Z Python 3.9.23 2025-09-07T06:20:58.3678544Z REPOSITORY TAG IMAGE ID CREATED SIZE 2025-09-07T06:20:58.3679672Z pytorch/manylinux2_28-builder cuda12.9 b3ae6fca04b3 28 hours ago 18.6GB 2025-09-07T06:20:59.1413217Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T06:20:59.1589485Z Obtaining file:///home/ec2-user/actions-runner/_work/pytorch/pytorch/.ci/lumen_cli 2025-09-07T06:20:59.2882768Z Installing build dependencies: started 2025-09-07T06:21:02.6372723Z Installing build dependencies: finished with status 'done' 2025-09-07T06:21:02.6401574Z Checking if build backend supports build_editable: started 2025-09-07T06:21:02.7751040Z Checking if build backend supports build_editable: finished with status 'done' 2025-09-07T06:21:02.7758754Z Getting requirements to build editable: started 2025-09-07T06:21:02.9587604Z Getting requirements to build editable: finished with status 'done' 2025-09-07T06:21:02.9596041Z Preparing editable metadata (pyproject.toml): started 2025-09-07T06:21:03.1438147Z Preparing editable metadata (pyproject.toml): finished with status 'done' 2025-09-07T06:21:03.4527816Z Collecting pyyaml==6.0.2 2025-09-07T06:21:03.4693546Z Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB) 2025-09-07T06:21:04.7049485Z Collecting uv==0.8.6 2025-09-07T06:21:04.7113458Z Downloading uv-0.8.6-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.3 MB) 2025-09-07T06:21:05.2919031Z Collecting docker==7.1.0 2025-09-07T06:21:05.2954512Z Downloading docker-7.1.0-py3-none-any.whl (147 kB) 2025-09-07T06:21:05.4955058Z Collecting GitPython==3.1.45 2025-09-07T06:21:05.4997063Z Downloading gitpython-3.1.45-py3-none-any.whl (208 kB) 2025-09-07T06:21:05.7780459Z Collecting pytest==7.3.2 2025-09-07T06:21:05.7820920Z Downloading pytest-7.3.2-py3-none-any.whl (320 kB) 2025-09-07T06:21:05.9443218Z Collecting urllib3>=1.26.0 2025-09-07T06:21:05.9476585Z Downloading urllib3-2.5.0-py3-none-any.whl (129 kB) 2025-09-07T06:21:06.0214354Z Collecting requests>=2.26.0 2025-09-07T06:21:06.0252834Z Downloading requests-2.32.5-py3-none-any.whl (64 kB) 2025-09-07T06:21:06.0662009Z Collecting gitdb<5,>=4.0.1 2025-09-07T06:21:06.0745303Z Downloading gitdb-4.0.12-py3-none-any.whl (62 kB) 2025-09-07T06:21:06.1256301Z Collecting typing-extensions>=3.10.0.2 2025-09-07T06:21:06.1292493Z Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB) 2025-09-07T06:21:06.1909255Z Collecting packaging 2025-09-07T06:21:06.1939843Z Downloading packaging-25.0-py3-none-any.whl (66 kB) 2025-09-07T06:21:06.2369761Z Collecting tomli>=1.0.0 2025-09-07T06:21:06.2405929Z Downloading tomli-2.2.1-py3-none-any.whl (14 kB) 2025-09-07T06:21:06.2746769Z Collecting pluggy<2.0,>=0.12 2025-09-07T06:21:06.2781467Z Downloading pluggy-1.6.0-py3-none-any.whl (20 kB) 2025-09-07T06:21:06.2985542Z Collecting iniconfig 2025-09-07T06:21:06.3022719Z Downloading iniconfig-2.1.0-py3-none-any.whl (6.0 kB) 2025-09-07T06:21:06.3345370Z Collecting exceptiongroup>=1.0.0rc8 2025-09-07T06:21:06.3383101Z Downloading exceptiongroup-1.3.0-py3-none-any.whl (16 kB) 2025-09-07T06:21:06.3772381Z Collecting smmap<6,>=3.0.1 2025-09-07T06:21:06.3813411Z Downloading smmap-5.0.2-py3-none-any.whl (24 kB) 2025-09-07T06:21:06.3968565Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests>=2.26.0->docker==7.1.0->lumen-ci==0.1.0) (2.10) 2025-09-07T06:21:06.7647143Z Collecting charset_normalizer<4,>=2 2025-09-07T06:21:06.7686419Z Downloading charset_normalizer-3.4.3-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (152 kB) 2025-09-07T06:21:06.8219979Z Collecting certifi>=2017.4.17 2025-09-07T06:21:06.8255612Z Downloading certifi-2025.8.3-py3-none-any.whl (161 kB) 2025-09-07T06:21:06.8637272Z Building wheels for collected packages: lumen-ci 2025-09-07T06:21:06.8641388Z Building editable for lumen-ci (pyproject.toml): started 2025-09-07T06:21:07.0654471Z Building editable for lumen-ci (pyproject.toml): finished with status 'done' 2025-09-07T06:21:07.0660646Z Created wheel for lumen-ci: filename=lumen_ci-0.1.0-0.editable-py3-none-any.whl size=2721 sha256=d9720d2dd24affac971f1ba7f51a21ce7334f60f21e61b924b1b07990fece5e9 2025-09-07T06:21:07.0662025Z Stored in directory: /tmp/pip-ephem-wheel-cache-pke23vfk/wheels/99/21/02/221df53baf03cd937166e2aa8f8dff3cd05f5c929f2b22b56e 2025-09-07T06:21:07.0677188Z Successfully built lumen-ci 2025-09-07T06:21:07.2091788Z Installing collected packages: urllib3, typing-extensions, smmap, charset-normalizer, certifi, tomli, requests, pluggy, packaging, iniconfig, gitdb, exceptiongroup, uv, pyyaml, pytest, GitPython, docker, lumen-ci 2025-09-07T06:21:07.3486482Z WARNING: The script normalizer is installed in '/home/ec2-user/.local/bin' which is not on PATH. 2025-09-07T06:21:07.3487481Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T06:21:08.0433212Z WARNING: The scripts py.test and pytest are installed in '/home/ec2-user/.local/bin' which is not on PATH. 2025-09-07T06:21:08.0434217Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T06:21:08.2373095Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-09-07T06:21:08.2375185Z Successfully installed GitPython-3.1.45 certifi-2025.8.3 charset-normalizer-3.4.3 docker-7.1.0 exceptiongroup-1.3.0 gitdb-4.0.12 iniconfig-2.1.0 lumen-ci-0.1.0 packaging-25.0 pluggy-1.6.0 pytest-7.3.2 pyyaml-6.0.2 requests-2.32.5 smmap-5.0.2 tomli-2.2.1 typing-extensions-4.15.0 urllib3-2.5.0 uv-0.8.6 2025-09-07T06:21:08.2377128Z awscli 2.25.0 requires urllib3<1.27,>=1.25.4, but you have urllib3 2.5.0 which is incompatible. 2025-09-07T06:21:08.3376388Z Building external package: vllm in directory /home/ec2-user/actions-runner/_work/_temp/artifacts/externals/vllm 2025-09-07T06:21:08.5569782Z 2025-09-07 06:21:08,556 [INFO] cli.lib.core.vllm.vllm_build: Running vllm build with inputs: VllmBuildParameters(use_torch_whl=True, torch_whls_path=PosixPath('/home/ec2-user/actions-runner/_work/_temp/artifacts'), use_local_base_image=True, base_image='pytorch/manylinux2_28-builder:cuda12.9', use_local_dockerfile=True, dockerfile_path=PosixPath('/home/ec2-user/actions-runner/_work/pytorch/pytorch/.github/ci_configs/vllm/Dockerfile.tmp_vllm'), output_dir=PosixPath('/home/ec2-user/actions-runner/_work/_temp/artifacts/externals/vllm'), target_stage='export-wheels', tag_name='vllm-wheels', cuda_version='12.8.1', python_version='3.12', max_jobs='42', sccache_bucket='ossci-compiler-cache-circleci-v2', sccache_region='us-east-1', torch_cuda_arch_list='8.0;8.9;9.0;10.0;12.0') 2025-09-07T06:21:08.5574015Z 2025-09-07 06:21:08,556 [INFO] cli.lib.common.git_helper: Cloning vllm to vllm 2025-09-07T06:21:08.7051410Z 2025-09-07 06:21:08,704 [INFO] cli.lib.common.git_helper: Progress: 20% - remote: Counting objects: 20% (13/62) 2025-09-07T06:21:08.7052320Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 25% - remote: Counting objects: 25% (16/62) 2025-09-07T06:21:08.7053199Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 30% - remote: Counting objects: 30% (19/62) 2025-09-07T06:21:08.7054077Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 35% - remote: Counting objects: 35% (22/62) 2025-09-07T06:21:08.7054936Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 40% - remote: Counting objects: 40% (25/62) 2025-09-07T06:21:08.7055849Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 45% - remote: Counting objects: 45% (28/62) 2025-09-07T06:21:08.7056718Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 50% - remote: Counting objects: 50% (31/62) 2025-09-07T06:21:08.7057827Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 70% - remote: Counting objects: 70% (44/62) 2025-09-07T06:21:08.7058716Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 75% - remote: Counting objects: 75% (47/62) 2025-09-07T06:21:08.7059572Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 80% - remote: Counting objects: 80% (50/62) 2025-09-07T06:21:08.7060444Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 85% - remote: Counting objects: 85% (53/62) 2025-09-07T06:21:08.7061318Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 90% - remote: Counting objects: 90% (56/62) 2025-09-07T06:21:08.7062170Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 95% - remote: Counting objects: 95% (59/62) 2025-09-07T06:21:08.7063169Z 2025-09-07 06:21:08,705 [INFO] cli.lib.common.git_helper: Progress: 100% - remote: Counting objects: 100% (62/62) 2025-09-07T06:21:08.7067194Z 2025-09-07 06:21:08,706 [INFO] cli.lib.common.git_helper: Progress: 10% - remote: Compressing objects: 10% (5/47) 2025-09-07T06:21:08.7103631Z 2025-09-07 06:21:08,710 [INFO] cli.lib.common.git_helper: Progress: 25% - remote: Compressing objects: 25% (12/47) 2025-09-07T06:21:08.7106003Z 2025-09-07 06:21:08,710 [INFO] cli.lib.common.git_helper: Progress: 40% - remote: Compressing objects: 40% (19/47) 2025-09-07T06:21:08.7107942Z 2025-09-07 06:21:08,710 [INFO] cli.lib.common.git_helper: Progress: 55% - remote: Compressing objects: 55% (26/47) 2025-09-07T06:21:08.7108828Z 2025-09-07 06:21:08,710 [INFO] cli.lib.common.git_helper: Progress: 65% - remote: Compressing objects: 65% (31/47) 2025-09-07T06:21:08.7109706Z 2025-09-07 06:21:08,710 [INFO] cli.lib.common.git_helper: Progress: 70% - remote: Compressing objects: 70% (33/47) 2025-09-07T06:21:08.7110844Z 2025-09-07 06:21:08,710 [INFO] cli.lib.common.git_helper: Progress: 80% - remote: Compressing objects: 80% (38/47) 2025-09-07T06:21:08.7111731Z 2025-09-07 06:21:08,710 [INFO] cli.lib.common.git_helper: Progress: 85% - remote: Compressing objects: 85% (40/47) 2025-09-07T06:21:08.7112600Z 2025-09-07 06:21:08,711 [INFO] cli.lib.common.git_helper: Progress: 95% - remote: Compressing objects: 95% (45/47) 2025-09-07T06:21:08.7113481Z 2025-09-07 06:21:08,711 [INFO] cli.lib.common.git_helper: Progress: 100% - remote: Compressing objects: 100% (47/47) 2025-09-07T06:21:08.7328912Z 2025-09-07 06:21:08,732 [INFO] cli.lib.common.git_helper: Progress: 0% - Receiving objects: 0% (1/110147) 2025-09-07T06:21:08.8345755Z 2025-09-07 06:21:08,834 [INFO] cli.lib.common.git_helper: Progress: 5% - Receiving objects: 5% (5508/110147) 2025-09-07T06:21:09.0436488Z 2025-09-07 06:21:09,043 [INFO] cli.lib.common.git_helper: Progress: 10% - Receiving objects: 10% (11015/110147) 2025-09-07T06:21:09.2773997Z 2025-09-07 06:21:09,277 [INFO] cli.lib.common.git_helper: Progress: 15% - Receiving objects: 15% (16523/110147), 20.40 MiB | 40.78 MiB/s 2025-09-07T06:21:09.4265658Z 2025-09-07 06:21:09,426 [INFO] cli.lib.common.git_helper: Progress: 20% - Receiving objects: 20% (22030/110147), 20.40 MiB | 40.78 MiB/s 2025-09-07T06:21:09.5543128Z 2025-09-07 06:21:09,554 [INFO] cli.lib.common.git_helper: Progress: 25% - Receiving objects: 25% (27537/110147), 20.40 MiB | 40.78 MiB/s 2025-09-07T06:21:09.6593265Z 2025-09-07 06:21:09,659 [INFO] cli.lib.common.git_helper: Progress: 30% - Receiving objects: 30% (33045/110147), 20.40 MiB | 40.78 MiB/s 2025-09-07T06:21:09.7686711Z 2025-09-07 06:21:09,768 [INFO] cli.lib.common.git_helper: Progress: 35% - Receiving objects: 35% (38552/110147), 42.14 MiB | 42.13 MiB/s 2025-09-07T06:21:09.8812374Z 2025-09-07 06:21:09,880 [INFO] cli.lib.common.git_helper: Progress: 40% - Receiving objects: 40% (44059/110147), 42.14 MiB | 42.13 MiB/s 2025-09-07T06:21:09.9732611Z 2025-09-07 06:21:09,972 [INFO] cli.lib.common.git_helper: Progress: 45% - Receiving objects: 45% (49567/110147), 42.14 MiB | 42.13 MiB/s 2025-09-07T06:21:10.0573653Z 2025-09-07 06:21:10,057 [INFO] cli.lib.common.git_helper: Progress: 50% - Receiving objects: 50% (55074/110147), 42.14 MiB | 42.13 MiB/s 2025-09-07T06:21:10.1568158Z 2025-09-07 06:21:10,156 [INFO] cli.lib.common.git_helper: Progress: 55% - Receiving objects: 55% (60581/110147), 42.14 MiB | 42.13 MiB/s 2025-09-07T06:21:10.2230036Z 2025-09-07 06:21:10,222 [INFO] cli.lib.common.git_helper: Progress: 60% - Receiving objects: 60% (66089/110147), 42.14 MiB | 42.13 MiB/s 2025-09-07T06:21:10.2799827Z 2025-09-07 06:21:10,279 [INFO] cli.lib.common.git_helper: Progress: 65% - Receiving objects: 65% (71596/110147), 64.56 MiB | 43.03 MiB/s 2025-09-07T06:21:10.3047953Z 2025-09-07 06:21:10,304 [INFO] cli.lib.common.git_helper: Progress: 70% - Receiving objects: 70% (77103/110147), 64.56 MiB | 43.03 MiB/s 2025-09-07T06:21:10.3360202Z 2025-09-07 06:21:10,335 [INFO] cli.lib.common.git_helper: Progress: 75% - Receiving objects: 75% (82611/110147), 64.56 MiB | 43.03 MiB/s 2025-09-07T06:21:10.3628430Z 2025-09-07 06:21:10,362 [INFO] cli.lib.common.git_helper: Progress: 80% - Receiving objects: 80% (88118/110147), 64.56 MiB | 43.03 MiB/s 2025-09-07T06:21:10.4047148Z 2025-09-07 06:21:10,404 [INFO] cli.lib.common.git_helper: Progress: 85% - Receiving objects: 85% (93625/110147), 64.56 MiB | 43.03 MiB/s 2025-09-07T06:21:10.4334566Z 2025-09-07 06:21:10,433 [INFO] cli.lib.common.git_helper: Progress: 90% - Receiving objects: 90% (99133/110147), 64.56 MiB | 43.03 MiB/s 2025-09-07T06:21:10.4995580Z 2025-09-07 06:21:10,499 [INFO] cli.lib.common.git_helper: Progress: 95% - Receiving objects: 95% (104640/110147), 64.56 MiB | 43.03 MiB/s 2025-09-07T06:21:10.5378166Z 2025-09-07 06:21:10,537 [INFO] cli.lib.common.git_helper: Progress: 100% - Receiving objects: 100% (110147/110147), 64.56 MiB | 43.03 MiB/s 2025-09-07T06:21:10.5640532Z 2025-09-07 06:21:10,563 [INFO] cli.lib.common.git_helper: Resolving deltas: 0% (0/87202) 2025-09-07T06:21:10.6141678Z 2025-09-07 06:21:10,614 [INFO] cli.lib.common.git_helper: Progress: 5% - Resolving deltas: 5% (4370/87202) 2025-09-07T06:21:10.6762812Z 2025-09-07 06:21:10,676 [INFO] cli.lib.common.git_helper: Progress: 10% - Resolving deltas: 10% (8721/87202) 2025-09-07T06:21:10.7419301Z 2025-09-07 06:21:10,741 [INFO] cli.lib.common.git_helper: Progress: 15% - Resolving deltas: 15% (13081/87202) 2025-09-07T06:21:10.8365902Z 2025-09-07 06:21:10,836 [INFO] cli.lib.common.git_helper: Progress: 20% - Resolving deltas: 20% (17441/87202) 2025-09-07T06:21:10.8876478Z 2025-09-07 06:21:10,887 [INFO] cli.lib.common.git_helper: Progress: 25% - Resolving deltas: 25% (21805/87202) 2025-09-07T06:21:10.9199441Z 2025-09-07 06:21:10,919 [INFO] cli.lib.common.git_helper: Progress: 30% - Resolving deltas: 30% (26161/87202) 2025-09-07T06:21:10.9603574Z 2025-09-07 06:21:10,960 [INFO] cli.lib.common.git_helper: Progress: 35% - Resolving deltas: 35% (30521/87202) 2025-09-07T06:21:11.0149202Z 2025-09-07 06:21:11,014 [INFO] cli.lib.common.git_helper: Progress: 40% - Resolving deltas: 40% (34884/87202) 2025-09-07T06:21:11.0638862Z 2025-09-07 06:21:11,063 [INFO] cli.lib.common.git_helper: Progress: 45% - Resolving deltas: 45% (39241/87202) 2025-09-07T06:21:11.0962503Z 2025-09-07 06:21:11,096 [INFO] cli.lib.common.git_helper: Progress: 50% - Resolving deltas: 50% (43608/87202) 2025-09-07T06:21:11.1251985Z 2025-09-07 06:21:11,125 [INFO] cli.lib.common.git_helper: Progress: 55% - Resolving deltas: 55% (47962/87202) 2025-09-07T06:21:11.1587064Z 2025-09-07 06:21:11,158 [INFO] cli.lib.common.git_helper: Progress: 60% - Resolving deltas: 60% (52322/87202) 2025-09-07T06:21:11.1920573Z 2025-09-07 06:21:11,191 [INFO] cli.lib.common.git_helper: Progress: 65% - Resolving deltas: 65% (56682/87202) 2025-09-07T06:21:11.2381320Z 2025-09-07 06:21:11,237 [INFO] cli.lib.common.git_helper: Progress: 70% - Resolving deltas: 70% (61044/87202) 2025-09-07T06:21:11.2671159Z 2025-09-07 06:21:11,266 [INFO] cli.lib.common.git_helper: Progress: 75% - Resolving deltas: 75% (65402/87202) 2025-09-07T06:21:11.3044458Z 2025-09-07 06:21:11,304 [INFO] cli.lib.common.git_helper: Progress: 80% - Resolving deltas: 80% (69763/87202) 2025-09-07T06:21:11.3315335Z 2025-09-07 06:21:11,331 [INFO] cli.lib.common.git_helper: Progress: 85% - Resolving deltas: 85% (74122/87202) 2025-09-07T06:21:11.3596000Z 2025-09-07 06:21:11,359 [INFO] cli.lib.common.git_helper: Progress: 90% - Resolving deltas: 90% (78482/87202) 2025-09-07T06:21:11.3852290Z 2025-09-07 06:21:11,385 [INFO] cli.lib.common.git_helper: Progress: 95% - Resolving deltas: 95% (82842/87202) 2025-09-07T06:21:11.4065622Z 2025-09-07 06:21:11,406 [INFO] cli.lib.common.git_helper: Progress: 100% - Resolving deltas: 100% (87202/87202) 2025-09-07T06:21:11.9286843Z 2025-09-07 06:21:11,928 [INFO] cli.lib.common.git_helper: Checking out pinned vllm commit 4172235ab78b09989fb56edaf734dbee283dda3e 2025-09-07T06:21:12.0908292Z 2025-09-07 06:21:12,090 [INFO] cli.lib.common.git_helper: Successfully cloned vllm 2025-09-07T06:21:13.9178089Z 2025-09-07 06:21:13,917 [INFO] cli.lib.core.vllm.vllm_build: Running docker build: 2025-09-07T06:21:13.9181744Z docker buildx build --output type=local,dest=/home/ec2-user/actions-runner/_work/_temp/artifacts/externals/vllm -f docker/Dockerfile.nightly_torch --pull=false --build-arg TORCH_WHEELS_PATH=tmp --build-arg BUILD_BASE_IMAGE=pytorch/manylinux2_28-builder:cuda12.9 --build-arg FINAL_BASE_IMAGE=pytorch/manylinux2_28-builder:cuda12.9 --build-arg max_jobs=42 --build-arg CUDA_VERSION=12.8.1 --build-arg PYTHON_VERSION=3.12 --build-arg USE_SCCACHE=1 --build-arg SCCACHE_BUCKET_NAME=ossci-compiler-cache-circleci-v2 --build-arg SCCACHE_REGION_NAME=us-east-1 --build-arg torch_cuda_arch_list='8.0;8.9;9.0;10.0;12.0' --target export-wheels -t vllm-wheels --progress=plain . 2025-09-07T06:21:13.9189552Z 2025-09-07 06:21:13,918 [INFO] cli.lib.common.utils: [cmd] docker buildx build --output type=local,dest=/home/ec2-user/actions-runner/_work/_temp/artifacts/externals/vllm -f docker/Dockerfile.nightly_torch --pull=false --build-arg TORCH_WHEELS_PATH=tmp --build-arg BUILD_BASE_IMAGE=pytorch/manylinux2_28-builder:cuda12.9 --build-arg FINAL_BASE_IMAGE=pytorch/manylinux2_28-builder:cuda12.9 --build-arg max_jobs=42 --build-arg CUDA_VERSION=12.8.1 --build-arg PYTHON_VERSION=3.12 --build-arg USE_SCCACHE=1 --build-arg SCCACHE_BUCKET_NAME=ossci-compiler-cache-circleci-v2 --build-arg SCCACHE_REGION_NAME=us-east-1 --build-arg torch_cuda_arch_list=8.0;8.9;9.0;10.0;12.0 --target export-wheels -t vllm-wheels --progress=plain . 2025-09-07T06:21:14.2451270Z #0 building with "default" instance using docker driver 2025-09-07T06:21:14.2451631Z 2025-09-07T06:21:14.2451897Z #1 [internal] load build definition from Dockerfile.nightly_torch 2025-09-07T06:21:14.2452399Z #1 transferring dockerfile: 18.57kB done 2025-09-07T06:21:14.2452749Z #1 DONE 0.0s 2025-09-07T06:21:14.2452912Z 2025-09-07T06:21:14.2453214Z #2 [internal] load metadata for docker.io/pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:21:14.2453735Z #2 DONE 0.0s 2025-09-07T06:21:14.2453871Z 2025-09-07T06:21:14.2453988Z #3 [internal] load .dockerignore 2025-09-07T06:21:14.2454335Z #3 transferring context: 442B done 2025-09-07T06:21:14.2454649Z #3 DONE 0.0s 2025-09-07T06:21:14.2454799Z 2025-09-07T06:21:14.2454912Z #4 [internal] load build context 2025-09-07T06:21:14.3456582Z #4 ... 2025-09-07T06:21:14.3456881Z 2025-09-07T06:21:14.3457312Z #5 [vllm-base 1/18] FROM docker.io/pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T06:21:14.3458014Z #5 DONE 0.2s 2025-09-07T06:21:14.3458238Z 2025-09-07T06:21:14.3458411Z #4 [internal] load build context 2025-09-07T06:21:19.2485993Z #4 transferring context: 1.18GB 5.1s 2025-09-07T06:21:24.2522123Z #4 transferring context: 2.38GB 10.1s 2025-09-07T06:21:24.4523042Z #4 ... 2025-09-07T06:21:24.4523253Z 2025-09-07T06:21:24.4523404Z #6 [vllm-base 2/18] WORKDIR /workspace 2025-09-07T06:21:24.5524228Z #6 ... 2025-09-07T06:21:24.5524702Z 2025-09-07T06:21:24.5525774Z #7 [base 2/20] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:21:24.5527188Z #7 1.538 Last metadata expiration check: 1 day, 4:16:05 ago on Sat 06 Sep 2025 02:05:10 AM UTC. 2025-09-07T06:21:24.5527838Z #7 2.278 Package git-2.43.7-1.el8_10.x86_64 is already installed. 2025-09-07T06:21:24.5528410Z #7 2.279 Package curl-7.61.1-34.el8_10.3.x86_64 is already installed. 2025-09-07T06:21:24.5528948Z #7 2.280 Package wget-1.19.5-12.el8_10.x86_64 is already installed. 2025-09-07T06:21:24.5529405Z #7 2.516 Dependencies resolved. 2025-09-07T06:21:24.5529748Z #7 2.518 ================================================================================ 2025-09-07T06:21:24.5530269Z #7 2.518 Package Arch Version Repository Size 2025-09-07T06:21:24.5530775Z #7 2.518 ================================================================================ 2025-09-07T06:21:24.5531235Z #7 2.518 Installing: 2025-09-07T06:21:24.5531786Z #7 2.518 sudo x86_64 1.9.5p2-1.el8_10.2 baseos 1.0 M 2025-09-07T06:21:24.5532366Z #7 2.518 vim-enhanced x86_64 2:8.0.1763-19.el8_6.4 appstream 1.4 M 2025-09-07T06:21:24.5532872Z #7 2.518 Installing dependencies: 2025-09-07T06:21:24.5533313Z #7 2.518 gpm-libs x86_64 1.20.7-17.el8 appstream 38 k 2025-09-07T06:21:24.5533908Z #7 2.518 vim-common x86_64 2:8.0.1763-19.el8_6.4 appstream 6.3 M 2025-09-07T06:21:24.5534542Z #7 2.518 vim-filesystem noarch 2:8.0.1763-19.el8_6.4 appstream 49 k 2025-09-07T06:21:24.5535167Z #7 2.518 2025-09-07T06:21:24.5535420Z #7 2.518 Transaction Summary 2025-09-07T06:21:24.5535769Z #7 2.518 ================================================================================ 2025-09-07T06:21:24.5536176Z #7 2.518 Install 5 Packages 2025-09-07T06:21:24.5536460Z #7 2.518 2025-09-07T06:21:24.5536715Z #7 2.518 Total download size: 8.8 M 2025-09-07T06:21:24.5537050Z #7 2.518 Installed size: 34 M 2025-09-07T06:21:24.5537378Z #7 2.519 Downloading Packages: 2025-09-07T06:21:24.5537845Z #7 2.785 (1/5): gpm-libs-1.20.7-17.el8.x86_64.rpm 201 kB/s | 38 kB 00:00 2025-09-07T06:21:24.5538461Z #7 2.906 (2/5): sudo-1.9.5p2-1.el8_10.2.x86_64.rpm 3.4 MB/s | 1.0 MB 00:00 2025-09-07T06:21:24.5539121Z #7 2.953 (3/5): vim-filesystem-8.0.1763-19.el8_6.4.noarc 1.0 MB/s | 49 kB 00:00 2025-09-07T06:21:24.5539790Z #7 3.006 (4/5): vim-enhanced-8.0.1763-19.el8_6.4.x86_64. 6.2 MB/s | 1.4 MB 00:00 2025-09-07T06:21:24.5540446Z #7 3.246 (5/5): vim-common-8.0.1763-19.el8_6.4.x86_64.rp 9.8 MB/s | 6.3 MB 00:00 2025-09-07T06:21:24.5541050Z #7 3.246 -------------------------------------------------------------------------------- 2025-09-07T06:21:24.5541582Z #7 3.246 Total 12 MB/s | 8.8 MB 00:00 2025-09-07T06:21:24.5542011Z #7 3.412 Running transaction check 2025-09-07T06:21:24.5542364Z #7 3.433 Transaction check succeeded. 2025-09-07T06:21:24.5542720Z #7 3.434 Running transaction test 2025-09-07T06:21:24.5543182Z #7 3.571 Transaction test succeeded. 2025-09-07T06:21:24.5543520Z #7 3.574 Running transaction 2025-09-07T06:21:24.5543899Z #7 4.207 Preparing : 1/1 2025-09-07T06:21:24.5544491Z #7 5.237 Installing : vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 1/5 2025-09-07T06:21:24.5545122Z #7 6.239 Installing : vim-common-2:8.0.1763-19.el8_6.4.x86_64 2/5 2025-09-07T06:21:24.5545759Z #7 7.445 Installing : gpm-libs-1.20.7-17.el8.x86_64 3/5 2025-09-07T06:21:24.5546490Z #7 7.786 Running scriptlet: gpm-libs-1.20.7-17.el8.x86_64 3/5 2025-09-07T06:21:24.5547142Z #7 7.956 Installing : vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 4/5 2025-09-07T06:21:24.5547773Z #7 8.564 Installing : sudo-1.9.5p2-1.el8_10.2.x86_64 5/5 2025-09-07T06:21:24.5548405Z #7 9.367 Running scriptlet: sudo-1.9.5p2-1.el8_10.2.x86_64 5/5 2025-09-07T06:21:24.6532351Z #7 9.442 Running scriptlet: vim-common-2:8.0.1763-19.el8_6.4.x86_64 5/5 2025-09-07T06:21:24.6532897Z #7 ... 2025-09-07T06:21:24.6533047Z 2025-09-07T06:21:24.6533170Z #4 [internal] load build context 2025-09-07T06:21:29.2557562Z #4 transferring context: 3.58GB 15.1s 2025-09-07T06:21:34.0553721Z #4 transferring context: 4.72GB 19.8s done 2025-09-07T06:21:34.3750992Z #4 DONE 20.2s 2025-09-07T06:21:34.3751466Z 2025-09-07T06:21:34.3754155Z #7 [base 2/20] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:21:34.3757104Z #7 9.442 Running scriptlet: vim-common-2:8.0.1763-19.el8_6.4.x86_64 5/5 2025-09-07T06:21:34.3757784Z #7 11.91 Verifying : sudo-1.9.5p2-1.el8_10.2.x86_64 1/5 2025-09-07T06:21:34.3758396Z #7 11.91 Verifying : gpm-libs-1.20.7-17.el8.x86_64 2/5 2025-09-07T06:21:34.3759029Z #7 11.91 Verifying : vim-common-2:8.0.1763-19.el8_6.4.x86_64 3/5 2025-09-07T06:21:34.3759643Z #7 11.91 Verifying : vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 4/5 2025-09-07T06:21:34.3760287Z #7 11.91 Verifying : vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 5/5 2025-09-07T06:21:34.3761128Z #7 12.18 2025-09-07T06:21:34.3761354Z #7 12.18 Installed: 2025-09-07T06:21:34.3761761Z #7 12.18 gpm-libs-1.20.7-17.el8.x86_64 2025-09-07T06:21:34.3762336Z #7 12.18 sudo-1.9.5p2-1.el8_10.2.x86_64 2025-09-07T06:21:34.3762943Z #7 12.18 vim-common-2:8.0.1763-19.el8_6.4.x86_64 2025-09-07T06:21:34.3763561Z #7 12.18 vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 2025-09-07T06:21:34.3764211Z #7 12.18 vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 2025-09-07T06:21:34.3764708Z #7 12.18 2025-09-07T06:21:34.3764937Z #7 12.18 Complete! 2025-09-07T06:21:34.3765214Z #7 12.47 Python 3.12.11 2025-09-07T06:21:34.3765709Z #7 12.69 pip 25.2 from /opt/python/cp312-cp312/lib/python3.12/site-packages/pip (python 3.12) 2025-09-07T06:23:01.2104404Z #7 ... 2025-09-07T06:23:01.2104702Z 2025-09-07T06:23:01.2104851Z #6 [vllm-base 2/18] WORKDIR /workspace 2025-09-07T06:23:01.2105255Z #6 DONE 106.9s 2025-09-07T06:23:01.3697154Z 2025-09-07T06:23:01.3700378Z #7 [base 2/20] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:23:01.3703794Z #7 DONE 106.9s 2025-09-07T06:23:01.3703945Z 2025-09-07T06:23:01.3704237Z #8 [base 3/20] RUN ldconfig /usr/local/cuda-$(echo 12.8.1 | cut -d. -f1,2)/compat/ 2025-09-07T06:23:01.9813334Z #8 DONE 0.8s 2025-09-07T06:23:01.9813551Z 2025-09-07T06:23:01.9816741Z #9 [vllm-base 3/18] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim && add-apt-repository -y ppa:deadsnakes/ppa && apt-get update -y && apt-get install -y python3.12 python3.12-dev python3.12-venv && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && update-alternatives --set python3 /usr/bin/python3.12 && ln -sf /usr/bin/python3.12-config /usr/bin/python3-config && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:23:02.7295416Z #9 1.513 Last metadata expiration check: 1 day, 4:17:52 ago on Sat 06 Sep 2025 02:05:10 AM UTC. 2025-09-07T06:23:03.3851693Z #9 2.169 Package git-2.43.7-1.el8_10.x86_64 is already installed. 2025-09-07T06:23:03.5345255Z #9 2.169 Package curl-7.61.1-34.el8_10.3.x86_64 is already installed. 2025-09-07T06:23:03.5345845Z #9 2.170 Package wget-1.19.5-12.el8_10.x86_64 is already installed. 2025-09-07T06:23:03.5346300Z #9 2.227 Dependencies resolved. 2025-09-07T06:23:03.5346691Z #9 2.228 ================================================================================ 2025-09-07T06:23:03.5347204Z #9 2.228 Package Arch Version Repository Size 2025-09-07T06:23:03.5347726Z #9 2.228 ================================================================================ 2025-09-07T06:23:03.5348097Z #9 2.228 Installing: 2025-09-07T06:23:03.5348470Z #9 2.228 sudo x86_64 1.9.5p2-1.el8_10.2 baseos 1.0 M 2025-09-07T06:23:03.5349414Z #9 2.228 vim-enhanced x86_64 2:8.0.1763-19.el8_6.4 appstream 1.4 M 2025-09-07T06:23:03.5349939Z #9 2.228 Installing dependencies: 2025-09-07T06:23:03.5350396Z #9 2.228 gpm-libs x86_64 1.20.7-17.el8 appstream 38 k 2025-09-07T06:23:03.5350981Z #9 2.228 vim-common x86_64 2:8.0.1763-19.el8_6.4 appstream 6.3 M 2025-09-07T06:23:03.5351621Z #9 2.228 vim-filesystem noarch 2:8.0.1763-19.el8_6.4 appstream 49 k 2025-09-07T06:23:03.5352362Z #9 2.228 2025-09-07T06:23:03.5352615Z #9 2.228 Transaction Summary 2025-09-07T06:23:03.5352961Z #9 2.228 ================================================================================ 2025-09-07T06:23:03.5353379Z #9 2.228 Install 5 Packages 2025-09-07T06:23:03.5353660Z #9 2.228 2025-09-07T06:23:03.5353909Z #9 2.229 Total download size: 8.8 M 2025-09-07T06:23:03.5354250Z #9 2.229 Installed size: 34 M 2025-09-07T06:23:03.5354559Z #9 2.229 Downloading Packages: 2025-09-07T06:23:03.6686518Z #9 2.375 (1/5): gpm-libs-1.20.7-17.el8.x86_64.rpm 2.9 MB/s | 38 kB 00:00 2025-09-07T06:23:03.6687220Z #9 2.393 (2/5): sudo-1.9.5p2-1.el8_10.2.x86_64.rpm 34 MB/s | 1.0 MB 00:00 2025-09-07T06:23:03.6687878Z #9 2.400 (3/5): vim-filesystem-8.0.1763-19.el8_6.4.noarc 7.3 MB/s | 49 kB 00:00 2025-09-07T06:23:03.6688542Z #9 2.410 (4/5): vim-enhanced-8.0.1763-19.el8_6.4.x86_64. 40 MB/s | 1.4 MB 00:00 2025-09-07T06:23:03.6689167Z #9 2.452 (5/5): vim-common-8.0.1763-19.el8_6.4.x86_64.rp 71 MB/s | 6.3 MB 00:00 2025-09-07T06:23:03.6689811Z #9 2.452 -------------------------------------------------------------------------------- 2025-09-07T06:23:03.7853413Z #9 2.452 Total 40 MB/s | 8.8 MB 00:00 2025-09-07T06:23:03.7853926Z #9 2.569 Running transaction check 2025-09-07T06:23:03.9177402Z #9 2.588 Transaction check succeeded. 2025-09-07T06:23:03.9177863Z #9 2.588 Running transaction test 2025-09-07T06:23:03.9178214Z #9 2.701 Transaction test succeeded. 2025-09-07T06:23:04.0255960Z #9 2.704 Running transaction 2025-09-07T06:23:04.1432115Z 2025-09-07T06:23:04.1432490Z #9 ... 2025-09-07T06:23:04.1432676Z 2025-09-07T06:23:04.1433316Z #10 [base 4/20] RUN --mount=type=cache,target=/root/.cache/uv if ! python3 -m uv --version >/dev/null 2>&1; then python3 -m pip install uv==0.8.4; fi 2025-09-07T06:23:04.1434105Z #10 1.551 Collecting uv==0.8.4 2025-09-07T06:23:04.1434684Z #10 1.567 Downloading uv-0.8.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB) 2025-09-07T06:23:04.1435530Z #10 1.580 Downloading uv-0.8.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.8 MB) 2025-09-07T06:23:04.1436759Z #10 1.687 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.8/18.8 MB 186.3 MB/s 0:00:00 2025-09-07T06:23:04.1437219Z #10 1.752 Installing collected packages: uv 2025-09-07T06:23:04.1437616Z #10 2.042 Successfully installed uv-0.8.4 2025-09-07T06:23:04.1439548Z #10 2.042 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-09-07T06:23:04.1441632Z #10 DONE 2.2s 2025-09-07T06:23:04.1441791Z 2025-09-07T06:23:04.1444526Z #9 [vllm-base 3/18] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim && add-apt-repository -y ppa:deadsnakes/ppa && apt-get update -y && apt-get install -y python3.12 python3.12-dev python3.12-venv && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && update-alternatives --set python3 /usr/bin/python3.12 && ln -sf /usr/bin/python3.12-config /usr/bin/python3-config && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:23:04.2753665Z #9 2.859 Preparing : 1/1 2025-09-07T06:23:04.2754151Z #9 ... 2025-09-07T06:23:04.2754286Z 2025-09-07T06:23:04.2754406Z #11 [base 5/20] WORKDIR /workspace 2025-09-07T06:23:04.2754743Z #11 DONE 0.0s 2025-09-07T06:23:04.2754890Z 2025-09-07T06:23:04.2757998Z #9 [vllm-base 3/18] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim && add-apt-repository -y ppa:deadsnakes/ppa && apt-get update -y && apt-get install -y python3.12 python3.12-dev python3.12-venv && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && update-alternatives --set python3 /usr/bin/python3.12 && ln -sf /usr/bin/python3.12-config /usr/bin/python3-config && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:23:04.2762785Z #9 2.859 Preparing : 1/1 2025-09-07T06:23:04.4174151Z #9 3.059 Installing : vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 1/5 2025-09-07T06:23:04.4174709Z #9 ... 2025-09-07T06:23:04.4174842Z 2025-09-07T06:23:04.4175104Z #12 [base 6/20] COPY requirements/common.txt requirements/common.txt 2025-09-07T06:23:04.4175564Z #12 DONE 0.3s 2025-09-07T06:23:04.6373731Z 2025-09-07T06:23:04.6374537Z #13 [base 7/20] COPY use_existing_torch.py use_existing_torch.py 2025-09-07T06:23:04.6375055Z #13 DONE 0.0s 2025-09-07T06:23:04.6375209Z 2025-09-07T06:23:04.6375389Z #14 [base 8/20] COPY pyproject.toml pyproject.toml 2025-09-07T06:23:04.6375772Z #14 DONE 0.0s 2025-09-07T06:23:04.6375919Z 2025-09-07T06:23:04.6376080Z #15 [base 9/20] RUN python3 use_existing_torch.py 2025-09-07T06:23:05.0225550Z #15 0.536 >>> cleaning requirements/common.txt 2025-09-07T06:23:05.0226054Z #15 0.536 <<< done cleaning requirements/common.txt 2025-09-07T06:23:05.0226421Z #15 0.536 2025-09-07T06:23:05.0226686Z #15 0.536 >>> cleaning pyproject.toml 2025-09-07T06:23:05.0227023Z #15 0.536 removed: 2025-09-07T06:23:05.0227272Z #15 0.536 "torch == 2.8.0", 2025-09-07T06:23:05.0227599Z #15 0.536 <<< done cleaning pyproject.toml 2025-09-07T06:23:05.0227962Z #15 0.536 2025-09-07T06:23:05.2443112Z #15 DONE 0.6s 2025-09-07T06:23:05.2443305Z 2025-09-07T06:23:05.2446332Z #9 [vllm-base 3/18] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim && add-apt-repository -y ppa:deadsnakes/ppa && apt-get update -y && apt-get install -y python3.12 python3.12-dev python3.12-venv && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && update-alternatives --set python3 /usr/bin/python3.12 && ln -sf /usr/bin/python3.12-config /usr/bin/python3-config && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:23:05.2449891Z #9 3.059 Installing : vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 1/5 2025-09-07T06:23:05.2450571Z #9 3.841 Installing : vim-common-2:8.0.1763-19.el8_6.4.x86_64 2/5 2025-09-07T06:23:05.2451312Z #9 3.876 Installing : gpm-libs-1.20.7-17.el8.x86_64 3/5 2025-09-07T06:23:05.2451987Z #9 3.890 Running scriptlet: gpm-libs-1.20.7-17.el8.x86_64 3/5 2025-09-07T06:23:05.3774002Z #9 4.028 Installing : vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 4/5 2025-09-07T06:23:05.6056327Z #9 4.161 Installing : sudo-1.9.5p2-1.el8_10.2.x86_64 5/5 2025-09-07T06:23:05.6058123Z #9 4.183 Running scriptlet: sudo-1.9.5p2-1.el8_10.2.x86_64 5/5 2025-09-07T06:23:07.5974202Z #9 4.238 Running scriptlet: vim-common-2:8.0.1763-19.el8_6.4.x86_64 5/5 2025-09-07T06:23:07.5974947Z #9 6.381 Verifying : sudo-1.9.5p2-1.el8_10.2.x86_64 1/5 2025-09-07T06:23:07.5975589Z #9 6.381 Verifying : gpm-libs-1.20.7-17.el8.x86_64 2/5 2025-09-07T06:23:07.7479023Z #9 6.381 Verifying : vim-common-2:8.0.1763-19.el8_6.4.x86_64 3/5 2025-09-07T06:23:07.7479786Z #9 6.381 Verifying : vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 4/5 2025-09-07T06:23:07.7812701Z #9 6.381 Verifying : vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 5/5 2025-09-07T06:23:07.7813267Z #9 6.565 2025-09-07T06:23:07.7813523Z #9 6.565 Installed: 2025-09-07T06:23:07.7813926Z #9 6.565 gpm-libs-1.20.7-17.el8.x86_64 2025-09-07T06:23:07.7814538Z #9 6.565 sudo-1.9.5p2-1.el8_10.2.x86_64 2025-09-07T06:23:07.7815168Z #9 6.565 vim-common-2:8.0.1763-19.el8_6.4.x86_64 2025-09-07T06:23:07.7815831Z #9 6.565 vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 2025-09-07T06:23:07.7816487Z #9 6.565 vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 2025-09-07T06:23:07.7817031Z #9 6.565 2025-09-07T06:23:07.7817260Z #9 6.565 Complete! 2025-09-07T06:23:08.0055157Z #9 6.638 Python 3.12.11 2025-09-07T06:23:08.0500319Z #9 6.833 pip 25.2 from /opt/python/cp312-cp312/lib/python3.12/site-packages/pip (python 3.12) 2025-09-07T06:23:15.1286847Z #9 ... 2025-09-07T06:23:15.1287212Z 2025-09-07T06:23:15.1292228Z #16 [base 10/20] RUN --mount=type=bind,source=tmp,target=/dist --mount=type=cache,target=/root/.cache/uv if [ -n "tmp" ] && [ "tmp" != "./requirements" ] && [ -d "/dist" ] && ls /dist/torch*.whl >/dev/null 2>&1; then echo "[INFO] Installing torch wheels to build vllm"; torch_whl=$(find /dist -maxdepth 1 -name 'torch-*.whl' -print -quit); vision_whl=$(find /dist -name 'torchvision*.whl' | head -n1 | xargs); audio_whl=$(find /dist -name 'torchaudio*.whl' | head -n1 | xargs); uv pip install --system "${torch_whl}[opt-einsum]" "${vision_whl}" "${audio_whl}" /dist/*.whl; elif [ -n "$PINNED_TORCH_VERSION" ]; then echo "[INFO] Installing pinned torch nightly version to build vllm: $PINNED_TORCH_VERSION"; uv pip install --system "$PINNED_TORCH_VERSION" --index-url https://download.pytorch.org/whl/nightly/cu$(echo 12.8.1 | cut -d. -f1,2 | tr -d '.'); else echo "[INFO] Installing torch nightly with latest one to build vllm"; uv pip install --system torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu$(echo 12.8.1 | cut -d. -f1,2 | tr -d '.'); fi 2025-09-07T06:23:15.1297110Z #16 0.938 [INFO] Installing torch wheels to build vllm 2025-09-07T06:23:15.6390988Z #16 1.032 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T06:23:15.6391688Z #16 1.089 Resolved 31 packages in 53ms 2025-09-07T06:23:15.6392064Z #16 8.724 Prepared 31 packages in 7.63s 2025-09-07T06:23:15.6392426Z #16 8.866 Uninstalled 1 package in 142ms 2025-09-07T06:23:15.6392804Z #16 10.58 Installed 31 packages in 1.71s 2025-09-07T06:23:15.8731023Z #16 10.66 + filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T06:23:15.8731919Z #16 10.66 + fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T06:23:15.8732570Z #16 10.66 + jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T06:23:15.8733401Z #16 10.66 + markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T06:23:15.8734227Z #16 10.66 + mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T06:23:15.8734808Z #16 10.66 + networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T06:23:15.8735594Z #16 10.66 + numpy==2.3.2 (from file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T06:23:15.8736672Z #16 10.66 + nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:23:15.8739282Z #16 10.66 + nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) 2025-09-07T06:23:15.8740314Z #16 10.66 + nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T06:23:15.8741421Z #16 10.66 + nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:23:15.8742428Z #16 10.66 + nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:23:15.8743370Z #16 10.66 + nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:23:15.8744380Z #16 10.66 + nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:23:15.8745352Z #16 10.66 + nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:23:15.8746272Z #16 10.66 + nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:23:15.8747483Z #16 10.66 + nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:23:15.8748720Z #16 10.66 + nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T06:23:15.8750328Z #16 10.66 + nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:23:15.8751492Z #16 10.66 + nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T06:23:15.8752695Z #16 10.66 + nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:23:15.8753960Z #16 10.66 + nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) 2025-09-07T06:23:15.8754679Z #16 10.66 + opt-einsum==3.4.0 2025-09-07T06:23:15.8755399Z #16 10.66 + pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T06:23:15.8756690Z #16 10.66 + pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T06:23:15.8757740Z #16 10.66 - setuptools==80.9.0 2025-09-07T06:23:15.8758223Z #16 10.66 + setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T06:23:15.8759194Z #16 10.66 + sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T06:23:15.8759998Z #16 10.66 + torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T06:23:15.8761109Z #16 10.66 + torchaudio==2.8.0.dev20250901+cu129 (from file:///dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T06:23:15.8762385Z #16 10.66 + torchvision==0.24.0.dev20250901+cu129 (from file:///dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T06:23:15.8763344Z #16 10.66 + typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T06:25:05.9628637Z #16 ... 2025-09-07T06:25:05.9628852Z 2025-09-07T06:25:05.9631843Z #9 [vllm-base 3/18] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim && add-apt-repository -y ppa:deadsnakes/ppa && apt-get update -y && apt-get install -y python3.12 python3.12-dev python3.12-venv && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && update-alternatives --set python3 /usr/bin/python3.12 && ln -sf /usr/bin/python3.12-config /usr/bin/python3-config && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:25:05.9635313Z #9 DONE 124.7s 2025-09-07T06:25:06.1261305Z 2025-09-07T06:25:06.1266218Z #16 [base 10/20] RUN --mount=type=bind,source=tmp,target=/dist --mount=type=cache,target=/root/.cache/uv if [ -n "tmp" ] && [ "tmp" != "./requirements" ] && [ -d "/dist" ] && ls /dist/torch*.whl >/dev/null 2>&1; then echo "[INFO] Installing torch wheels to build vllm"; torch_whl=$(find /dist -maxdepth 1 -name 'torch-*.whl' -print -quit); vision_whl=$(find /dist -name 'torchvision*.whl' | head -n1 | xargs); audio_whl=$(find /dist -name 'torchaudio*.whl' | head -n1 | xargs); uv pip install --system "${torch_whl}[opt-einsum]" "${vision_whl}" "${audio_whl}" /dist/*.whl; elif [ -n "$PINNED_TORCH_VERSION" ]; then echo "[INFO] Installing pinned torch nightly version to build vllm: $PINNED_TORCH_VERSION"; uv pip install --system "$PINNED_TORCH_VERSION" --index-url https://download.pytorch.org/whl/nightly/cu$(echo 12.8.1 | cut -d. -f1,2 | tr -d '.'); else echo "[INFO] Installing torch nightly with latest one to build vllm"; uv pip install --system torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu$(echo 12.8.1 | cut -d. -f1,2 | tr -d '.'); fi 2025-09-07T06:25:06.1270891Z #16 DONE 120.9s 2025-09-07T06:25:06.1271045Z 2025-09-07T06:25:06.1271407Z #17 [base 11/20] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system numba==0.61.2 2025-09-07T06:25:06.4060654Z #17 0.431 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T06:25:06.6487438Z #17 0.508 Resolved 3 packages in 72ms 2025-09-07T06:25:06.6487870Z #17 0.516 Downloading numba (3.7MiB) 2025-09-07T06:25:06.6488225Z #17 0.516 Downloading llvmlite (40.4MiB) 2025-09-07T06:25:06.6488576Z #17 0.523 Downloading numpy (15.8MiB) 2025-09-07T06:25:06.9510005Z #17 0.975 Downloading llvmlite 2025-09-07T06:25:07.0701581Z #17 1.095 Downloading numba 2025-09-07T06:25:07.1895845Z #17 1.178 Downloading numpy 2025-09-07T06:25:07.1896256Z #17 1.178 Prepared 3 packages in 669ms 2025-09-07T06:25:07.1896622Z #17 1.214 Uninstalled 1 package in 36ms 2025-09-07T06:25:07.4258239Z #17 1.300 Installed 3 packages in 85ms 2025-09-07T06:25:07.4258645Z #17 1.300 + llvmlite==0.44.0 2025-09-07T06:25:07.4258999Z #17 1.300 + numba==0.61.2 2025-09-07T06:25:07.4259613Z #17 1.300 - numpy==2.3.2 (from file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T06:25:07.4260242Z #17 1.300 + numpy==2.2.6 2025-09-07T06:25:09.6232580Z #17 DONE 3.6s 2025-09-07T06:25:09.7766647Z 2025-09-07T06:25:09.7768429Z #18 [base 12/20] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system -r requirements/common.txt 2025-09-07T06:25:10.1279929Z #18 0.502 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T06:25:10.7213233Z #18 1.096 Resolved 133 packages in 587ms 2025-09-07T06:25:10.9003606Z #18 1.105 Downloading aiohttp (1.6MiB) 2025-09-07T06:25:10.9004080Z #18 1.107 Downloading pydantic-core (1.9MiB) 2025-09-07T06:25:10.9004485Z #18 1.107 Downloading transformers (11.1MiB) 2025-09-07T06:25:10.9004883Z #18 1.107 Downloading scipy (33.5MiB) 2025-09-07T06:25:10.9005240Z #18 1.107 Downloading soundfile (1.3MiB) 2025-09-07T06:25:10.9005597Z #18 1.108 Downloading pygments (1.2MiB) 2025-09-07T06:25:10.9006015Z #18 1.109 Downloading opencv-python-headless (51.5MiB) 2025-09-07T06:25:10.9006448Z #18 1.109 Downloading tokenizers (3.2MiB) 2025-09-07T06:25:10.9006804Z #18 1.119 Downloading hf-xet (3.0MiB) 2025-09-07T06:25:10.9007159Z #18 1.120 Downloading pycountry (6.0MiB) 2025-09-07T06:25:10.9007863Z #18 1.120 Downloading openai-harmony (2.9MiB) 2025-09-07T06:25:10.9008241Z #18 1.121 Downloading xgrammar (7.5MiB) 2025-09-07T06:25:10.9008583Z #18 1.121 Downloading tiktoken (1.1MiB) 2025-09-07T06:25:10.9008957Z #18 1.122 Downloading llguidance (14.3MiB) 2025-09-07T06:25:10.9009309Z #18 1.122 Downloading uvloop (4.5MiB) 2025-09-07T06:25:10.9009677Z #18 1.122 Downloading mistral-common (6.2MiB) 2025-09-07T06:25:10.9010071Z #18 1.123 Downloading outlines-core (2.2MiB) 2025-09-07T06:25:10.9010426Z #18 1.124 Downloading triton (148.4MiB) 2025-09-07T06:25:10.9010918Z #18 1.124 Downloading sentencepiece (1.3MiB) 2025-09-07T06:25:11.1660052Z #18 1.540 Downloading tiktoken 2025-09-07T06:25:11.3066872Z #18 1.557 Downloading soundfile 2025-09-07T06:25:11.3067289Z #18 1.598 Downloading sentencepiece 2025-09-07T06:25:11.3067650Z #18 1.681 Downloading aiohttp 2025-09-07T06:25:11.4529703Z #18 1.697 Downloading pydantic-core 2025-09-07T06:25:11.4530130Z #18 1.736 Downloading outlines-core 2025-09-07T06:25:11.4530506Z #18 1.827 Downloading openai-harmony 2025-09-07T06:25:11.6020093Z #18 1.855 Downloading hf-xet 2025-09-07T06:25:11.6020484Z #18 1.860 Downloading tokenizers 2025-09-07T06:25:11.6020835Z #18 1.976 Downloading pygments 2025-09-07T06:25:11.8230999Z #18 2.047 Downloading uvloop 2025-09-07T06:25:11.9200820Z #18 2.294 Downloading xgrammar 2025-09-07T06:25:12.0739360Z #18 2.298 Downloading mistral-common 2025-09-07T06:25:12.0834716Z #18 2.458 Downloading pycountry 2025-09-07T06:25:12.2764744Z #18 2.500 Downloading llguidance 2025-09-07T06:25:13.0961591Z #18 3.470 Downloading opencv-python-headless 2025-09-07T06:25:14.0579265Z #18 4.432 Downloading scipy 2025-09-07T06:25:14.2140943Z #18 4.439 Downloading triton 2025-09-07T06:25:14.2396422Z #18 4.614 Downloading transformers 2025-09-07T06:25:14.3904796Z #18 4.614 Prepared 104 packages in 3.51s 2025-09-07T06:25:14.7682966Z #18 5.143 Installed 104 packages in 528ms 2025-09-07T06:25:14.9214254Z #18 5.143 + aiohappyeyeballs==2.6.1 2025-09-07T06:25:14.9214722Z #18 5.143 + aiohttp==3.12.15 2025-09-07T06:25:14.9215097Z #18 5.143 + aiosignal==1.4.0 2025-09-07T06:25:14.9215428Z #18 5.143 + annotated-types==0.7.0 2025-09-07T06:25:14.9215760Z #18 5.143 + anyio==4.10.0 2025-09-07T06:25:14.9216323Z #18 5.143 + astor==0.8.1 2025-09-07T06:25:14.9216611Z #18 5.143 + attrs==25.3.0 2025-09-07T06:25:14.9216900Z #18 5.143 + blake3==1.0.5 2025-09-07T06:25:14.9217181Z #18 5.143 + cachetools==6.2.0 2025-09-07T06:25:14.9217480Z #18 5.143 + cbor2==5.7.0 2025-09-07T06:25:14.9217752Z #18 5.143 + certifi==2025.8.3 2025-09-07T06:25:14.9218050Z #18 5.143 + cffi==1.17.1 2025-09-07T06:25:14.9218342Z #18 5.143 + charset-normalizer==3.4.3 2025-09-07T06:25:14.9218687Z #18 5.143 + click==8.2.1 2025-09-07T06:25:14.9218979Z #18 5.143 + cloudpickle==3.1.1 2025-09-07T06:25:14.9219301Z #18 5.143 + compressed-tensors==0.11.0 2025-09-07T06:25:14.9219651Z #18 5.143 + depyf==0.19.0 2025-09-07T06:25:14.9219923Z #18 5.143 + dill==0.4.0 2025-09-07T06:25:14.9220209Z #18 5.143 + diskcache==5.6.3 2025-09-07T06:25:14.9220513Z #18 5.143 + distro==1.9.0 2025-09-07T06:25:14.9220804Z #18 5.143 + dnspython==2.7.0 2025-09-07T06:25:14.9221093Z #18 5.143 + einops==0.8.1 2025-09-07T06:25:14.9221404Z #18 5.143 + email-validator==2.3.0 2025-09-07T06:25:14.9221725Z #18 5.143 + fastapi==0.116.1 2025-09-07T06:25:14.9222036Z #18 5.143 + fastapi-cli==0.0.10 2025-09-07T06:25:14.9222366Z #18 5.143 + fastapi-cloud-cli==0.1.5 2025-09-07T06:25:14.9222701Z #18 5.143 + frozendict==2.4.6 2025-09-07T06:25:14.9223014Z #18 5.143 + frozenlist==1.7.0 2025-09-07T06:25:14.9223304Z #18 5.143 + gguf==0.17.1 2025-09-07T06:25:14.9223689Z #18 5.143 + h11==0.16.0 2025-09-07T06:25:14.9223948Z #18 5.143 + hf-xet==1.1.9 2025-09-07T06:25:14.9224227Z #18 5.143 + httpcore==1.0.9 2025-09-07T06:25:14.9224527Z #18 5.143 + httptools==0.6.4 2025-09-07T06:25:14.9224825Z #18 5.143 + httpx==0.28.1 2025-09-07T06:25:14.9225121Z #18 5.143 + huggingface-hub==0.34.4 2025-09-07T06:25:14.9225432Z #18 5.143 + idna==3.10 2025-09-07T06:25:14.9225846Z #18 5.144 + interegular==0.3.3 2025-09-07T06:25:14.9226134Z #18 5.144 + jiter==0.10.0 2025-09-07T06:25:14.9226422Z #18 5.144 + jsonschema==4.25.1 2025-09-07T06:25:14.9226766Z #18 5.144 + jsonschema-specifications==2025.4.1 2025-09-07T06:25:14.9227145Z #18 5.144 + lark==1.2.2 2025-09-07T06:25:14.9227414Z #18 5.144 + llguidance==0.7.30 2025-09-07T06:25:14.9227738Z #18 5.144 + lm-format-enforcer==0.11.3 2025-09-07T06:25:14.9228081Z #18 5.144 + markdown-it-py==4.0.0 2025-09-07T06:25:14.9228402Z #18 5.144 + mdurl==0.1.2 2025-09-07T06:25:14.9228693Z #18 5.144 + mistral-common==1.8.4 2025-09-07T06:25:14.9229004Z #18 5.144 + msgspec==0.19.0 2025-09-07T06:25:14.9229303Z #18 5.144 + multidict==6.6.4 2025-09-07T06:25:14.9229587Z #18 5.144 + ninja==1.13.0 2025-09-07T06:25:14.9229867Z #18 5.144 + openai==1.106.1 2025-09-07T06:25:14.9230157Z #18 5.144 + openai-harmony==0.0.4 2025-09-07T06:25:14.9230522Z #18 5.144 + opencv-python-headless==4.12.0.88 2025-09-07T06:25:14.9230889Z #18 5.144 + outlines-core==0.2.10 2025-09-07T06:25:14.9231250Z #18 5.144 + partial-json-parser==0.2.1.1.post6 2025-09-07T06:25:14.9231639Z #18 5.144 + prometheus-client==0.22.1 2025-09-07T06:25:14.9232142Z #18 5.144 + prometheus-fastapi-instrumentator==7.1.0 2025-09-07T06:25:14.9232534Z #18 5.144 + propcache==0.3.2 2025-09-07T06:25:14.9232815Z #18 5.144 + protobuf==6.32.0 2025-09-07T06:25:14.9233101Z #18 5.144 + psutil==7.0.0 2025-09-07T06:25:14.9233369Z #18 5.144 + py-cpuinfo==9.0.0 2025-09-07T06:25:14.9233661Z #18 5.144 + pybase64==1.4.2 2025-09-07T06:25:14.9233934Z #18 5.144 + pycountry==24.6.1 2025-09-07T06:25:14.9234224Z #18 5.144 + pycparser==2.22 2025-09-07T06:25:14.9234499Z #18 5.144 + pydantic==2.11.7 2025-09-07T06:25:14.9234795Z #18 5.144 + pydantic-core==2.33.2 2025-09-07T06:25:14.9235130Z #18 5.144 + pydantic-extra-types==2.10.5 2025-09-07T06:25:14.9235460Z #18 5.144 + pygments==2.19.2 2025-09-07T06:25:14.9235757Z #18 5.144 + python-dotenv==1.1.1 2025-09-07T06:25:14.9236070Z #18 5.144 + python-json-logger==3.3.0 2025-09-07T06:25:14.9236418Z #18 5.144 + python-multipart==0.0.20 2025-09-07T06:25:14.9236888Z #18 5.144 + pyyaml==6.0.2 2025-09-07T06:25:14.9237189Z #18 5.144 + pyzmq==27.0.2 2025-09-07T06:25:14.9237564Z #18 5.144 + referencing==0.36.2 2025-09-07T06:25:14.9237865Z #18 5.144 + regex==2025.9.1 2025-09-07T06:25:14.9238156Z #18 5.145 + requests==2.32.5 2025-09-07T06:25:14.9238435Z #18 5.145 + rich==14.1.0 2025-09-07T06:25:14.9238720Z #18 5.145 + rich-toolkit==0.15.1 2025-09-07T06:25:14.9239016Z #18 5.145 + rignore==0.6.4 2025-09-07T06:25:14.9239306Z #18 5.145 + rpds-py==0.27.1 2025-09-07T06:25:14.9239592Z #18 5.145 + safetensors==0.6.2 2025-09-07T06:25:14.9239893Z #18 5.145 + scipy==1.16.1 2025-09-07T06:25:14.9240172Z #18 5.145 + sentencepiece==0.2.1 2025-09-07T06:25:14.9240493Z #18 5.145 + sentry-sdk==2.37.0 2025-09-07T06:25:14.9240802Z #18 5.145 + setproctitle==1.3.7 2025-09-07T06:25:14.9241103Z #18 5.145 + shellingham==1.5.4 2025-09-07T06:25:14.9241406Z #18 5.145 + six==1.17.0 2025-09-07T06:25:14.9241675Z #18 5.145 + sniffio==1.3.1 2025-09-07T06:25:14.9241966Z #18 5.145 + soundfile==0.13.1 2025-09-07T06:25:14.9242251Z #18 5.145 + soxr==0.5.0.post1 2025-09-07T06:25:14.9242547Z #18 5.145 + starlette==0.47.3 2025-09-07T06:25:14.9242836Z #18 5.145 + tiktoken==0.11.0 2025-09-07T06:25:14.9243140Z #18 5.145 + tokenizers==0.22.0 2025-09-07T06:25:14.9243425Z #18 5.145 + tqdm==4.67.1 2025-09-07T06:25:14.9243710Z #18 5.145 + transformers==4.56.1 2025-09-07T06:25:14.9244009Z #18 5.145 + triton==3.4.0 2025-09-07T06:25:14.9244289Z #18 5.145 + typer==0.17.4 2025-09-07T06:25:14.9244590Z #18 5.145 + typing-inspection==0.4.1 2025-09-07T06:25:14.9244909Z #18 5.145 + urllib3==2.5.0 2025-09-07T06:25:14.9245196Z #18 5.145 + uvicorn==0.35.0 2025-09-07T06:25:14.9245477Z #18 5.145 + uvloop==0.21.0 2025-09-07T06:25:14.9245767Z #18 5.145 + watchfiles==1.1.0 2025-09-07T06:25:14.9246059Z #18 5.145 + websockets==15.0.1 2025-09-07T06:25:14.9246362Z #18 5.145 + xgrammar==0.1.23 2025-09-07T06:25:14.9246718Z #18 5.145 + yarl==1.20.1 2025-09-07T06:25:32.1697679Z #18 DONE 22.5s 2025-09-07T06:25:32.3231121Z 2025-09-07T06:25:32.3231812Z #19 [base 13/20] RUN echo 7.5;8.0+PTX;9.0a 2025-09-07T06:25:32.7155140Z #19 0.543 7.5;8.0+PTX;9.0a 2025-09-07T06:25:32.8813981Z #19 DONE 0.6s 2025-09-07T06:25:32.8814188Z 2025-09-07T06:25:32.8814304Z #20 [base 14/20] RUN echo 42 2025-09-07T06:25:33.8651485Z #20 1.135 42 2025-09-07T06:25:34.0311587Z #20 DONE 1.1s 2025-09-07T06:25:34.0312148Z 2025-09-07T06:25:34.0312656Z #21 [base 15/20] RUN pip freeze | grep -E 'ninja' 2025-09-07T06:25:35.2060304Z #21 1.326 ninja==1.13.0 2025-09-07T06:25:35.3719217Z #21 DONE 1.3s 2025-09-07T06:25:35.3719710Z 2025-09-07T06:25:35.3725221Z #22 [base 16/20] RUN --mount=type=cache,target=/root/.cache/ccache --mount=type=cache,target=/root/.cache/uv echo 'git clone xformers...' && git clone https://github.com/facebookresearch/xformers.git --recursive && cd xformers && git checkout 5d4b92a5e5a9c6c6d4878283f47d82e17995b468 && git submodule update --init --recursive && echo 'finish git clone xformers...' && rm -rf build && python3 setup.py bdist_wheel --dist-dir=../xformers-dist --verbose && cd .. && rm -rf xformers 2025-09-07T06:25:35.9642657Z #22 0.743 git clone xformers... 2025-09-07T06:25:36.1173766Z #22 0.746 Cloning into 'xformers'... 2025-09-07T06:25:37.2736732Z #22 2.053 Submodule 'third_party/composable_kernel_tiled' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel_tiled' 2025-09-07T06:25:37.4331919Z #22 2.053 Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2025-09-07T06:25:37.4335253Z #22 2.053 Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention' 2025-09-07T06:25:37.4337968Z #22 2.061 Cloning into '/workspace/xformers/third_party/composable_kernel_tiled'... 2025-09-07T06:25:40.1736977Z #22 4.953 Cloning into '/workspace/xformers/third_party/cutlass'... 2025-09-07T06:25:42.0805821Z #22 6.860 Cloning into '/workspace/xformers/third_party/flash-attention'... 2025-09-07T06:25:42.9390724Z #22 7.718 Submodule path 'third_party/composable_kernel_tiled': checked out '50fad035248b154cdfa4505cf5de7465ce146149' 2025-09-07T06:25:43.6613693Z #22 8.440 Submodule path 'third_party/cutlass': checked out 'e9627ce55b42fd2599f58cd4396da9380954def0' 2025-09-07T06:25:43.7770227Z #22 8.556 Submodule path 'third_party/flash-attention': checked out '3ba6f826b199ff68aa9e9139a46280160defa5cd' 2025-09-07T06:25:43.9426484Z #22 8.563 Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:25:43.9429354Z #22 8.564 Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:25:43.9430361Z #22 8.571 Cloning into '/workspace/xformers/third_party/flash-attention/csrc/composable_kernel'... 2025-09-07T06:25:46.7072005Z #22 11.49 Cloning into '/workspace/xformers/third_party/flash-attention/csrc/cutlass'... 2025-09-07T06:25:48.9690677Z #22 13.75 Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out 'd58f2b8bd0c2adad65a731403673d545d8483acb' 2025-09-07T06:25:49.7491859Z #22 14.53 Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'dc4817921edda44a549197ff3a9dcf5df0636e7b' 2025-09-07T06:25:51.2796046Z #22 16.06 Note: switching to '5d4b92a5e5a9c6c6d4878283f47d82e17995b468'. 2025-09-07T06:25:51.2796519Z #22 16.06 2025-09-07T06:25:51.2796928Z #22 16.06 You are in 'detached HEAD' state. You can look around, make experimental 2025-09-07T06:25:51.2797587Z #22 16.06 changes and commit them, and you can discard any commits you make in this 2025-09-07T06:25:51.2798223Z #22 16.06 state without impacting any branches by switching back to a branch. 2025-09-07T06:25:51.2798693Z #22 16.06 2025-09-07T06:25:51.2799299Z #22 16.06 If you want to create a new branch to retain commits you create, you may 2025-09-07T06:25:51.2799902Z #22 16.06 do so (now or later) by using -c with the switch command. Example: 2025-09-07T06:25:51.2800326Z #22 16.06 2025-09-07T06:25:51.2800607Z #22 16.06 git switch -c 2025-09-07T06:25:51.2800956Z #22 16.06 2025-09-07T06:25:51.2801194Z #22 16.06 Or undo this operation with: 2025-09-07T06:25:51.2801524Z #22 16.06 2025-09-07T06:25:51.2801744Z #22 16.06 git switch - 2025-09-07T06:25:51.2802011Z #22 16.06 2025-09-07T06:25:51.2802407Z #22 16.06 Turn off this advice by setting config variable advice.detachedHead to false 2025-09-07T06:25:51.2802906Z #22 16.06 2025-09-07T06:25:51.2803258Z #22 16.06 HEAD is now at 5d4b92a5 Update wheels matrix for ROCM + README update 2025-09-07T06:25:51.4622285Z #22 16.24 finish git clone xformers... 2025-09-07T06:25:55.8981391Z #22 20.68 W0907 06:25:55.896000 448 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/torch/utils/cpp_extension.py:119] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 2025-09-07T06:25:56.0089048Z #22 20.79 /opt/python/cp312-cp312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated. 2025-09-07T06:25:56.0089988Z #22 20.79 !! 2025-09-07T06:25:56.0090231Z #22 20.79 2025-09-07T06:25:56.0090513Z #22 20.79 ******************************************************************************** 2025-09-07T06:25:56.0091492Z #22 20.79 Please consider removing the following classifiers in favor of a SPDX license expression: 2025-09-07T06:25:56.0092099Z #22 20.79 2025-09-07T06:25:56.0093931Z #22 20.79 License :: OSI Approved :: BSD License 2025-09-07T06:25:56.0094329Z #22 20.79 2025-09-07T06:25:56.0094889Z #22 20.79 See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details. 2025-09-07T06:25:56.0095603Z #22 20.79 ******************************************************************************** 2025-09-07T06:25:56.0095995Z #22 20.79 2025-09-07T06:25:56.0096221Z #22 20.79 !! 2025-09-07T06:25:56.0096498Z #22 20.79 self._finalize_license_expression() 2025-09-07T06:25:56.1089586Z #22 20.82 running bdist_wheel 2025-09-07T06:25:56.1090247Z #22 20.87 running build 2025-09-07T06:25:56.1090570Z #22 20.87 running build_py 2025-09-07T06:25:56.1091095Z #22 20.89 creating build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:25:56.3266796Z #22 20.89 copying xformers/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:25:56.3268204Z #22 20.89 copying xformers/_cpp_lib.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:25:56.3269591Z #22 20.89 copying xformers/_deprecation_warning.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:25:56.3271062Z #22 20.89 copying xformers/attn_bias_utils.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:25:56.3272474Z #22 20.89 copying xformers/checkpoint.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:25:56.3273808Z #22 20.89 copying xformers/info.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:25:56.3275076Z #22 20.89 copying xformers/test.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:25:56.3276349Z #22 20.89 copying xformers/utils.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:25:56.3277534Z #22 20.89 creating build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3278961Z #22 20.89 copying xformers/benchmarks/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3280808Z #22 20.89 copying xformers/benchmarks/benchmark_attn_decoding.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3282774Z #22 20.89 copying xformers/benchmarks/benchmark_core.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3284693Z #22 20.89 copying xformers/benchmarks/benchmark_indexing.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3287082Z #22 20.89 copying xformers/benchmarks/benchmark_mem_eff_attention.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3289230Z #22 20.89 copying xformers/benchmarks/benchmark_merge_attentions.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3291462Z #22 20.89 copying xformers/benchmarks/benchmark_nystrom_utils.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3293413Z #22 20.89 copying xformers/benchmarks/benchmark_revnet.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3295270Z #22 20.89 copying xformers/benchmarks/benchmark_sddmm.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3297371Z #22 20.89 copying xformers/benchmarks/benchmark_sequence_parallel_fused.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3299438Z #22 20.89 copying xformers/benchmarks/benchmark_sp24.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3301317Z #22 20.89 copying xformers/benchmarks/benchmark_swiglu.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3303205Z #22 20.89 copying xformers/benchmarks/benchmark_tiled_matmul.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3305076Z #22 20.89 copying xformers/benchmarks/utils.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:25:56.3306511Z #22 20.89 creating build/lib.linux-x86_64-cpython-312/xformers/components 2025-09-07T06:25:56.3307961Z #22 20.89 copying xformers/components/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/components 2025-09-07T06:25:56.3309623Z #22 20.89 copying xformers/components/input_projection.py -> build/lib.linux-x86_64-cpython-312/xformers/components 2025-09-07T06:25:56.3311052Z #22 20.89 copying xformers/components/residual.py -> build/lib.linux-x86_64-cpython-312/xformers/components 2025-09-07T06:25:56.3312245Z #22 20.90 creating build/lib.linux-x86_64-cpython-312/xformers/flash_attn_3 2025-09-07T06:25:56.3313438Z #22 20.90 copying xformers/flash_attn_3/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/flash_attn_3 2025-09-07T06:25:56.3314814Z #22 20.90 creating build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3315776Z #22 20.90 copying xformers/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3316887Z #22 20.90 copying xformers/ops/common.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3318162Z #22 20.90 copying xformers/ops/differentiable_collectives.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3319472Z #22 20.90 copying xformers/ops/indexing.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3320692Z #22 20.90 copying xformers/ops/modpar_layers.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3321882Z #22 20.90 copying xformers/ops/rmsnorm.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3323118Z #22 20.90 copying xformers/ops/rope_padded.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3324306Z #22 20.90 copying xformers/ops/seqpar.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3325685Z #22 20.90 copying xformers/ops/sequence_parallel_fused_ops.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3327135Z #22 20.90 copying xformers/ops/sp24.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3328543Z #22 20.90 copying xformers/ops/swiglu_op.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3329845Z #22 20.90 copying xformers/ops/tiled_matmul.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3331209Z #22 20.90 copying xformers/ops/tree_attention.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3332353Z #22 20.90 copying xformers/ops/unbind.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:25:56.3333661Z #22 20.90 creating build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:25:56.3334685Z #22 20.90 copying xformers/profiler/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:25:56.3335978Z #22 20.90 copying xformers/profiler/api.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:25:56.3337277Z #22 20.90 copying xformers/profiler/device_limits.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:25:56.3338571Z #22 20.90 copying xformers/profiler/find_slowest.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:25:56.3339983Z #22 20.90 copying xformers/profiler/profile_analyzer.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:25:56.3341320Z #22 20.90 copying xformers/profiler/profiler.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:25:56.3342702Z #22 20.90 copying xformers/profiler/profiler_dcgm.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:25:56.3344409Z #22 20.90 copying xformers/profiler/profiler_dcgm_impl.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:25:56.3345900Z #22 20.90 creating build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:25:56.3346934Z #22 20.90 copying xformers/sparse/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:25:56.3348125Z #22 20.90 copying xformers/sparse/_csr_ops.py -> build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:25:56.3349705Z #22 20.90 copying xformers/sparse/blocksparse_tensor.py -> build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:25:56.3351395Z #22 20.90 copying xformers/sparse/csr_tensor.py -> build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:25:56.3353004Z #22 20.90 copying xformers/sparse/utils.py -> build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:25:56.3354269Z #22 20.90 creating build/lib.linux-x86_64-cpython-312/xformers/triton 2025-09-07T06:25:56.3355298Z #22 20.90 copying xformers/triton/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/triton 2025-09-07T06:25:56.3356884Z #22 20.90 copying xformers/triton/importing.py -> build/lib.linux-x86_64-cpython-312/xformers/triton 2025-09-07T06:25:56.3358085Z #22 20.90 copying xformers/triton/vararg_kernel.py -> build/lib.linux-x86_64-cpython-312/xformers/triton 2025-09-07T06:25:56.3359170Z #22 20.90 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:25:56.3360278Z #22 20.90 copying xformers/_flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:25:56.3361569Z #22 20.90 copying xformers/_flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:25:56.3362964Z #22 20.90 copying xformers/_flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:25:56.3364378Z #22 20.90 copying xformers/_flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:25:56.3365816Z #22 20.90 copying xformers/_flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:25:56.3367420Z #22 20.90 copying xformers/_flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:25:56.3369152Z #22 20.90 copying xformers/_flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:25:56.3370664Z #22 20.90 copying xformers/_flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:25:56.3371934Z #22 20.90 creating build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:25:56.3373113Z #22 20.91 copying xformers/benchmarks/LRA/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:25:56.3374596Z #22 20.91 copying xformers/benchmarks/LRA/batch_fetch_results.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:25:56.3376377Z #22 20.91 copying xformers/benchmarks/LRA/batch_submit.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:25:56.3377886Z #22 20.91 copying xformers/benchmarks/LRA/run_grid_search.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:25:56.3379355Z #22 20.91 copying xformers/benchmarks/LRA/run_tasks.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:25:56.3380925Z #22 20.91 copying xformers/benchmarks/LRA/run_with_submitit.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:25:56.3382507Z #22 20.91 creating build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA/code 2025-09-07T06:25:56.3383783Z #22 20.91 copying xformers/benchmarks/LRA/code/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA/code 2025-09-07T06:25:56.3385406Z #22 20.91 copying xformers/benchmarks/LRA/code/dataset.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA/code 2025-09-07T06:25:56.3387050Z #22 20.91 copying xformers/benchmarks/LRA/code/model_wrapper.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA/code 2025-09-07T06:25:56.3388425Z #22 20.91 creating build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:25:56.3389766Z #22 20.91 copying xformers/components/attention/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:25:56.3391409Z #22 20.91 copying xformers/components/attention/_sputnik_sparse.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:25:56.3393183Z #22 20.91 copying xformers/components/attention/attention_mask.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:25:56.3395005Z #22 20.91 copying xformers/components/attention/attention_patterns.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:25:56.3396762Z #22 20.91 copying xformers/components/attention/base.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:25:56.3398706Z #22 20.91 copying xformers/components/attention/core.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:25:56.3400331Z #22 20.91 copying xformers/components/attention/fourier_mix.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:25:56.3402065Z #22 20.91 copying xformers/components/attention/scaled_dot_product.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:25:56.3403823Z #22 20.91 copying xformers/components/attention/sparsity_config.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:25:56.3405495Z #22 20.91 copying xformers/components/attention/utils.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:25:56.3406762Z #22 20.91 creating build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:25:56.3408073Z #22 20.91 copying xformers/ops/_triton/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:25:56.3409514Z #22 20.91 copying xformers/ops/_triton/k_index_select_cat.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:25:56.3411188Z #22 20.91 copying xformers/ops/_triton/k_scaled_index_add.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:25:56.3412662Z #22 20.91 copying xformers/ops/_triton/matmul_perf_model.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:25:56.3414136Z #22 20.91 copying xformers/ops/_triton/rmsnorm_kernels.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:25:56.3415582Z #22 20.91 copying xformers/ops/_triton/rope_padded_kernels.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:25:56.3417113Z #22 20.91 copying xformers/ops/_triton/tiled_matmul_kernels.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:25:56.3418517Z #22 20.91 creating build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3419585Z #22 20.91 copying xformers/ops/fmha/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3420865Z #22 20.91 copying xformers/ops/fmha/attn_bias.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3422127Z #22 20.91 copying xformers/ops/fmha/ck.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3423346Z #22 20.91 copying xformers/ops/fmha/ck_splitk.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3424650Z #22 20.91 copying xformers/ops/fmha/common.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3425860Z #22 20.91 copying xformers/ops/fmha/cutlass.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3427175Z #22 20.91 copying xformers/ops/fmha/dispatch.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3428534Z #22 20.91 copying xformers/ops/fmha/flash.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3429748Z #22 20.91 copying xformers/ops/fmha/flash3.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3431005Z #22 20.91 copying xformers/ops/fmha/merge_training.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3432379Z #22 20.91 copying xformers/ops/fmha/torch_attention_compat.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3433809Z #22 20.91 copying xformers/ops/fmha/triton_splitk.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:25:56.3435089Z #22 20.91 creating build/lib.linux-x86_64-cpython-312/xformers/ops/fmha/_triton 2025-09-07T06:25:56.3436281Z #22 20.91 copying xformers/ops/fmha/_triton/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha/_triton 2025-09-07T06:25:56.3437790Z #22 20.91 copying xformers/ops/fmha/_triton/splitk_kernels.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha/_triton 2025-09-07T06:25:56.3439269Z #22 20.92 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3441215Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3443298Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/bench.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3445778Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/bwd_prefill.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3447784Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/bwd_prefill_fused.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3457242Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/bwd_prefill_onekernel.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3459779Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/bwd_prefill_split.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3461847Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/bwd_ref.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3464188Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/fp8.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3466892Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/fwd_decode.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3468964Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/fwd_prefill.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3471513Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/fwd_ref.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3473818Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/interface_fa.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3476004Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/test.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3478189Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/train.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3497563Z #22 20.92 copying xformers/_flash_attn/flash_attn_triton_amd/utils.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:25:56.3499779Z #22 20.92 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/layers 2025-09-07T06:25:56.3501147Z #22 20.92 copying xformers/_flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/layers 2025-09-07T06:25:56.3502842Z #22 20.92 copying xformers/_flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/layers 2025-09-07T06:25:56.3504508Z #22 20.92 copying xformers/_flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/layers 2025-09-07T06:25:56.3505886Z #22 20.92 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/losses 2025-09-07T06:25:56.3507131Z #22 20.92 copying xformers/_flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/losses 2025-09-07T06:25:56.3508754Z #22 20.92 copying xformers/_flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/losses 2025-09-07T06:25:56.3510146Z #22 20.92 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3511378Z #22 20.92 copying xformers/_flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3513300Z #22 20.92 copying xformers/_flash_attn/models/baichuan.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3514773Z #22 20.92 copying xformers/_flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3516174Z #22 20.92 copying xformers/_flash_attn/models/bigcode.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3517673Z #22 20.92 copying xformers/_flash_attn/models/btlm.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3519309Z #22 20.92 copying xformers/_flash_attn/models/falcon.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3520967Z #22 20.92 copying xformers/_flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3522570Z #22 20.92 copying xformers/_flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3524163Z #22 20.92 copying xformers/_flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3525744Z #22 20.92 copying xformers/_flash_attn/models/llama.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3527246Z #22 20.92 copying xformers/_flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3528719Z #22 20.92 copying xformers/_flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:25:56.3529968Z #22 20.92 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:25:56.3531366Z #22 20.92 copying xformers/_flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:25:56.3533229Z #22 20.92 copying xformers/_flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:25:56.3534862Z #22 20.92 copying xformers/_flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:25:56.3536444Z #22 20.92 copying xformers/_flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:25:56.3537906Z #22 20.92 copying xformers/_flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:25:56.3539260Z #22 20.92 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:25:56.3540466Z #22 20.92 copying xformers/_flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:25:56.3541864Z #22 20.93 copying xformers/_flash_attn/ops/activations.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:25:56.3543347Z #22 20.93 copying xformers/_flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:25:56.3544789Z #22 20.93 copying xformers/_flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:25:56.3546289Z #22 20.93 copying xformers/_flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:25:56.3547590Z #22 20.93 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:25:56.3549209Z #22 20.93 copying xformers/_flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:25:56.3551000Z #22 20.93 copying xformers/_flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:25:56.3552866Z #22 20.93 copying xformers/_flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:25:56.3554656Z #22 20.93 copying xformers/_flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:25:56.3556875Z #22 20.93 copying xformers/_flash_attn/utils/library.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:25:56.3558785Z #22 20.93 copying xformers/_flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:25:56.3560717Z #22 20.93 copying xformers/_flash_attn/utils/testing.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:25:56.3562549Z #22 20.93 copying xformers/_flash_attn/utils/torch.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:25:56.3564093Z #22 20.93 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:25:56.3565736Z #22 20.93 copying xformers/_flash_attn/ops/triton/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:25:56.3567798Z #22 20.93 copying xformers/_flash_attn/ops/triton/cross_entropy.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:25:56.3570068Z #22 20.93 copying xformers/_flash_attn/ops/triton/k_activations.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:25:56.3572413Z #22 20.93 copying xformers/_flash_attn/ops/triton/layer_norm.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:25:56.3574065Z #22 20.93 copying xformers/_flash_attn/ops/triton/linear.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:25:56.3575729Z #22 20.93 copying xformers/_flash_attn/ops/triton/mlp.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:25:56.3577260Z #22 20.93 copying xformers/_flash_attn/ops/triton/rotary.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:25:56.3578350Z #22 20.93 running build_ext 2025-09-07T06:25:56.3580145Z #22 20.94 W0907 06:25:56.162000 448 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/torch/utils/cpp_extension.py:533] There are no g++ version bounds defined for CUDA version 12.9 2025-09-07T06:25:56.3581649Z #22 20.94 building 'xformers.flash_attn_3._C' extension 2025-09-07T06:25:56.3582857Z #22 20.95 creating /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper 2025-09-07T06:25:56.3584697Z #22 20.95 creating /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations 2025-09-07T06:26:42.2523699Z #22 67.03 [1/154] c++ -MMD -MF /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_api.o.d -pthread -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 -Wall -fPIC -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/flash_api.cpp -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_api.o -O3 -std=c++17 -DPy_LIMITED_API=0x03090000 -fopenmp -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:27:36.8093994Z #22 121.6 [2/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_prepare_scheduler.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/flash_prepare_scheduler.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_prepare_scheduler.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:27:36.8107188Z #22 121.6 ptxas info : 11 bytes gmem 2025-09-07T06:27:36.8108390Z #22 121.6 ptxas info : Compiling entry function '_ZN5flash32prepare_varlen_num_blocks_kernelEiiiPKiS1_S1_S1_S1_S1_iiiiiN7cutlass10FastDivmodES3_PiS4_b' for 'sm_90a' 2025-09-07T06:27:36.8110261Z #22 121.6 ptxas info : Function properties for _ZN5flash32prepare_varlen_num_blocks_kernelEiiiPKiS1_S1_S1_S1_S1_iiiiiN7cutlass10FastDivmodES3_PiS4_b 2025-09-07T06:27:36.8111920Z #22 121.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:27:36.8112699Z #22 121.6 ptxas info : Used 13 registers, used 1 barriers, 4 bytes smem 2025-09-07T06:27:36.8113374Z #22 121.6 ptxas info : Compile time = 98.829 ms 2025-09-07T06:27:36.8113949Z #22 121.6 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:27:36.8115215Z #22 121.6 ptxas info : Compiling entry function '_ZN5flash32prepare_varlen_num_blocks_kernelEiiiPKiS1_S1_S1_S1_S1_iiiiiN7cutlass10FastDivmodES3_PiS4_b' for 'sm_80' 2025-09-07T06:27:36.8117095Z #22 121.6 ptxas info : Function properties for _ZN5flash32prepare_varlen_num_blocks_kernelEiiiPKiS1_S1_S1_S1_S1_iiiiiN7cutlass10FastDivmodES3_PiS4_b 2025-09-07T06:27:36.8118546Z #22 121.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:27:36.8119415Z #22 121.6 ptxas info : Used 14 registers, used 1 barriers, 4 bytes smem, 481 bytes cmem[0] 2025-09-07T06:27:36.8120169Z #22 121.6 ptxas info : Compile time = 39.168 ms 2025-09-07T06:30:00.6023845Z #22 265.4 [3/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_fwd_combine.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/flash_fwd_combine.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_fwd_combine.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:00.7563777Z #22 265.4 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:30:00.7586877Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7591636Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7594117Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7595250Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7596201Z #22 265.4 ptxas info : Compile time = 75.587 ms 2025-09-07T06:30:00.7598807Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7603464Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7606074Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7607170Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7608015Z #22 265.4 ptxas info : Compile time = 63.130 ms 2025-09-07T06:30:00.7610546Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7615160Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7617958Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7619082Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7620063Z #22 265.4 ptxas info : Compile time = 56.354 ms 2025-09-07T06:30:00.7622771Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7627329Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7630103Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7631242Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7632243Z #22 265.4 ptxas info : Compile time = 54.352 ms 2025-09-07T06:30:00.7635273Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7639778Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7642578Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7643726Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7644681Z #22 265.4 ptxas info : Compile time = 93.299 ms 2025-09-07T06:30:00.7647370Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7652575Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7655352Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7656517Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7657502Z #22 265.4 ptxas info : Compile time = 82.004 ms 2025-09-07T06:30:00.7660260Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7664924Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7667704Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7668859Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7669835Z #22 265.4 ptxas info : Compile time = 72.319 ms 2025-09-07T06:30:00.7672559Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7677058Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7679820Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7680948Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7681935Z #22 265.4 ptxas info : Compile time = 70.363 ms 2025-09-07T06:30:00.7684672Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7689200Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7692104Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7693248Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7694272Z #22 265.4 ptxas info : Compile time = 73.847 ms 2025-09-07T06:30:00.7697267Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7701802Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7704592Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7705715Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7706699Z #22 265.4 ptxas info : Compile time = 64.645 ms 2025-09-07T06:30:00.7709470Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7714100Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7716884Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7718031Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7719024Z #22 265.4 ptxas info : Compile time = 59.493 ms 2025-09-07T06:30:00.7721730Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7726222Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7729156Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7730303Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7731425Z #22 265.4 ptxas info : Compile time = 54.300 ms 2025-09-07T06:30:00.7734158Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7738627Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7741422Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7742560Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7743551Z #22 265.4 ptxas info : Compile time = 96.711 ms 2025-09-07T06:30:00.7746304Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7751059Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7753795Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7754898Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7755829Z #22 265.4 ptxas info : Compile time = 87.949 ms 2025-09-07T06:30:00.7758794Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7763284Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7766037Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7767154Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7768149Z #22 265.4 ptxas info : Compile time = 81.652 ms 2025-09-07T06:30:00.7770860Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7775536Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7778305Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7779469Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7780451Z #22 265.4 ptxas info : Compile time = 78.115 ms 2025-09-07T06:30:00.7783195Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7787772Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7790767Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7791929Z #22 265.4 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7792875Z #22 265.4 ptxas info : Compile time = 85.183 ms 2025-09-07T06:30:00.7795658Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7800183Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7802901Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7804067Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7805064Z #22 265.4 ptxas info : Compile time = 67.406 ms 2025-09-07T06:30:00.7807739Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7812323Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7815034Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7816163Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7817098Z #22 265.4 ptxas info : Compile time = 59.689 ms 2025-09-07T06:30:00.7819815Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7824551Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7827321Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7828466Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7829466Z #22 265.4 ptxas info : Compile time = 53.928 ms 2025-09-07T06:30:00.7832166Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7836772Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7839541Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7840636Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7841625Z #22 265.4 ptxas info : Compile time = 53.974 ms 2025-09-07T06:30:00.7844359Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7851975Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7854995Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7856138Z #22 265.4 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7857120Z #22 265.4 ptxas info : Compile time = 113.604 ms 2025-09-07T06:30:00.7859875Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7864340Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7867132Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7868278Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7869274Z #22 265.4 ptxas info : Compile time = 92.532 ms 2025-09-07T06:30:00.7872049Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7876636Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7879413Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7880565Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7881553Z #22 265.4 ptxas info : Compile time = 82.947 ms 2025-09-07T06:30:00.7884292Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7889072Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7892016Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7893161Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7894151Z #22 265.4 ptxas info : Compile time = 77.957 ms 2025-09-07T06:30:00.7896901Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7901468Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7904259Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7905387Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7906375Z #22 265.4 ptxas info : Compile time = 73.012 ms 2025-09-07T06:30:00.7909168Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7913713Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7916471Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7917778Z #22 265.4 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7918762Z #22 265.4 ptxas info : Compile time = 84.051 ms 2025-09-07T06:30:00.7921460Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7926047Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7928795Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7929988Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7931095Z #22 265.4 ptxas info : Compile time = 68.722 ms 2025-09-07T06:30:00.7933892Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7938478Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7941250Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7942403Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7943393Z #22 265.4 ptxas info : Compile time = 59.973 ms 2025-09-07T06:30:00.7946165Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7951239Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7954022Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7955162Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7956150Z #22 265.4 ptxas info : Compile time = 54.866 ms 2025-09-07T06:30:00.7958914Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7963494Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7966310Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7967442Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7968439Z #22 265.4 ptxas info : Compile time = 53.207 ms 2025-09-07T06:30:00.7971275Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7975785Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7978575Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7979915Z #22 265.4 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7980918Z #22 265.4 ptxas info : Compile time = 108.575 ms 2025-09-07T06:30:00.7983582Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.7988190Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.7991002Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.7992145Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.7993147Z #22 265.4 ptxas info : Compile time = 90.589 ms 2025-09-07T06:30:00.7995937Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8000427Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8003197Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8004323Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8005314Z #22 265.4 ptxas info : Compile time = 81.675 ms 2025-09-07T06:30:00.8008058Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8012920Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8015686Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8016854Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8017819Z #22 265.4 ptxas info : Compile time = 75.453 ms 2025-09-07T06:30:00.8020574Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8025124Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8027876Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8029010Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8029984Z #22 265.4 ptxas info : Compile time = 73.327 ms 2025-09-07T06:30:00.8032721Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8037184Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8039926Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8041062Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8042185Z #22 265.4 ptxas info : Compile time = 72.617 ms 2025-09-07T06:30:00.8044884Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8049645Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8052428Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8053599Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8054583Z #22 265.4 ptxas info : Compile time = 64.740 ms 2025-09-07T06:30:00.8057166Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8061675Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8064392Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8065538Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8066514Z #22 265.4 ptxas info : Compile time = 59.227 ms 2025-09-07T06:30:00.8069260Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8073745Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8076724Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8077900Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8078893Z #22 265.4 ptxas info : Compile time = 54.914 ms 2025-09-07T06:30:00.8081610Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8086061Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8088732Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8089921Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8091020Z #22 265.4 ptxas info : Compile time = 93.181 ms 2025-09-07T06:30:00.8093738Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8098184Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8100912Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8102093Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8103064Z #22 265.4 ptxas info : Compile time = 85.362 ms 2025-09-07T06:30:00.8105969Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8110396Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8113104Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8114259Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8115261Z #22 265.4 ptxas info : Compile time = 79.295 ms 2025-09-07T06:30:00.8117948Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8122422Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8125167Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8126334Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8127342Z #22 265.4 ptxas info : Compile time = 76.327 ms 2025-09-07T06:30:00.8130066Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8134553Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8137297Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8138336Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8139474Z #22 265.4 ptxas info : Compile time = 72.417 ms 2025-09-07T06:30:00.8142074Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8146464Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8149389Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8150503Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8151508Z #22 265.4 ptxas info : Compile time = 60.913 ms 2025-09-07T06:30:00.8154207Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8158683Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8161404Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8162560Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8163558Z #22 265.4 ptxas info : Compile time = 53.829 ms 2025-09-07T06:30:00.8166196Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8171054Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8173792Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8174921Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8175911Z #22 265.4 ptxas info : Compile time = 54.081 ms 2025-09-07T06:30:00.8178609Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8183104Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8185864Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8186989Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8187954Z #22 265.4 ptxas info : Compile time = 96.078 ms 2025-09-07T06:30:00.8190566Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8194995Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8197706Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8198874Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8199850Z #22 265.4 ptxas info : Compile time = 80.111 ms 2025-09-07T06:30:00.8202730Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8207068Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8209788Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8211036Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8212012Z #22 265.4 ptxas info : Compile time = 77.789 ms 2025-09-07T06:30:00.8214651Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8219083Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8221771Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8222926Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8223911Z #22 265.4 ptxas info : Compile time = 75.670 ms 2025-09-07T06:30:00.8226610Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8231886Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8234584Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8235731Z #22 265.4 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8236730Z #22 265.4 ptxas info : Compile time = 80.081 ms 2025-09-07T06:30:00.8239344Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8243834Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8246543Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8247715Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8248688Z #22 265.4 ptxas info : Compile time = 64.338 ms 2025-09-07T06:30:00.8251644Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8256080Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8258726Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8259852Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8260863Z #22 265.4 ptxas info : Compile time = 55.178 ms 2025-09-07T06:30:00.8263770Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8268271Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8270987Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8272115Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8273108Z #22 265.4 ptxas info : Compile time = 53.473 ms 2025-09-07T06:30:00.8275788Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8280210Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8282968Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8284109Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8285094Z #22 265.4 ptxas info : Compile time = 51.215 ms 2025-09-07T06:30:00.8287719Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8292276Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8295228Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8296705Z #22 265.4 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8297695Z #22 265.4 ptxas info : Compile time = 107.123 ms 2025-09-07T06:30:00.8300340Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8304734Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8307377Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8308550Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8309555Z #22 265.4 ptxas info : Compile time = 86.307 ms 2025-09-07T06:30:00.8312297Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8316777Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8319523Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8320677Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8321697Z #22 265.4 ptxas info : Compile time = 78.118 ms 2025-09-07T06:30:00.8324408Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8329043Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8331858Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8333028Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8334020Z #22 265.4 ptxas info : Compile time = 73.774 ms 2025-09-07T06:30:00.8336716Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8341146Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8343844Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8344983Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8345919Z #22 265.4 ptxas info : Compile time = 73.620 ms 2025-09-07T06:30:00.8348588Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8353315Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8356231Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8357381Z #22 265.4 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8358358Z #22 265.4 ptxas info : Compile time = 83.178 ms 2025-09-07T06:30:00.8361074Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8365537Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8368239Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8369400Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8370385Z #22 265.4 ptxas info : Compile time = 63.264 ms 2025-09-07T06:30:00.8373157Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8377643Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8380396Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8381539Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8382521Z #22 265.4 ptxas info : Compile time = 56.910 ms 2025-09-07T06:30:00.8385215Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8389833Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8392581Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8393723Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8394710Z #22 265.4 ptxas info : Compile time = 54.797 ms 2025-09-07T06:30:00.8397417Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8401813Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8404484Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8405623Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8406614Z #22 265.4 ptxas info : Compile time = 54.150 ms 2025-09-07T06:30:00.8409303Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8413862Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8416587Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8417907Z #22 265.4 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8418916Z #22 265.4 ptxas info : Compile time = 107.698 ms 2025-09-07T06:30:00.8421643Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8426152Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8428918Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8430051Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8431052Z #22 265.4 ptxas info : Compile time = 86.183 ms 2025-09-07T06:30:00.8433775Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8438234Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8440987Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8442121Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8443114Z #22 265.4 ptxas info : Compile time = 78.712 ms 2025-09-07T06:30:00.8445818Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8450592Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8453616Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8454757Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8455736Z #22 265.4 ptxas info : Compile time = 74.953 ms 2025-09-07T06:30:00.8458459Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8462928Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8465663Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8466825Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8467811Z #22 265.4 ptxas info : Compile time = 74.476 ms 2025-09-07T06:30:00.8470428Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8474666Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8477312Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8478472Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8479627Z #22 265.4 ptxas info : Compile time = 71.780 ms 2025-09-07T06:30:00.8482236Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8486498Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8489069Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8490248Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8491402Z #22 265.4 ptxas info : Compile time = 62.923 ms 2025-09-07T06:30:00.8494023Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8498275Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8500811Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8501929Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8502928Z #22 265.4 ptxas info : Compile time = 56.647 ms 2025-09-07T06:30:00.8505555Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8509868Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8512488Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8513827Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8514820Z #22 265.4 ptxas info : Compile time = 54.893 ms 2025-09-07T06:30:00.8517486Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8521814Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8524429Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8525574Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8526562Z #22 265.4 ptxas info : Compile time = 95.308 ms 2025-09-07T06:30:00.8529147Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8533553Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8536162Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8537315Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8538317Z #22 265.4 ptxas info : Compile time = 83.780 ms 2025-09-07T06:30:00.8540868Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8545293Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8547915Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8549385Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8550360Z #22 265.4 ptxas info : Compile time = 78.182 ms 2025-09-07T06:30:00.8552953Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8557234Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8559818Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8560994Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8561998Z #22 265.4 ptxas info : Compile time = 73.460 ms 2025-09-07T06:30:00.8564589Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8568862Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8571570Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8572725Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8573705Z #22 265.4 ptxas info : Compile time = 70.118 ms 2025-09-07T06:30:00.8576552Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8580828Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8583473Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8584636Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8585613Z #22 265.4 ptxas info : Compile time = 61.873 ms 2025-09-07T06:30:00.8588187Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8592463Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8595089Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8596231Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8597241Z #22 265.4 ptxas info : Compile time = 56.587 ms 2025-09-07T06:30:00.8599859Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8604007Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8606813Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8607974Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8608972Z #22 265.4 ptxas info : Compile time = 53.830 ms 2025-09-07T06:30:00.8611715Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8615944Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8618512Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8619682Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8620672Z #22 265.4 ptxas info : Compile time = 91.453 ms 2025-09-07T06:30:00.8623248Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8627520Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8630160Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8631326Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8632291Z #22 265.4 ptxas info : Compile time = 81.125 ms 2025-09-07T06:30:00.8634900Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8639318Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8641983Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8643145Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8644111Z #22 265.4 ptxas info : Compile time = 76.117 ms 2025-09-07T06:30:00.8646741Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8651338Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8654022Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8655185Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8656158Z #22 265.4 ptxas info : Compile time = 72.423 ms 2025-09-07T06:30:00.8658799Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8662983Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8665591Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8666969Z #22 265.4 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8667974Z #22 265.4 ptxas info : Compile time = 81.735 ms 2025-09-07T06:30:00.8670629Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8674846Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8677397Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8678559Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8679558Z #22 265.4 ptxas info : Compile time = 66.094 ms 2025-09-07T06:30:00.8682189Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8686380Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8689029Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8690192Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8691308Z #22 265.4 ptxas info : Compile time = 57.854 ms 2025-09-07T06:30:00.8693910Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8698172Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8700977Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8702166Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8703174Z #22 265.4 ptxas info : Compile time = 52.267 ms 2025-09-07T06:30:00.8705765Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8710014Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8712637Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8713805Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8714774Z #22 265.4 ptxas info : Compile time = 49.550 ms 2025-09-07T06:30:00.8717364Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8721620Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8724251Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8725391Z #22 265.4 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8726376Z #22 265.4 ptxas info : Compile time = 107.746 ms 2025-09-07T06:30:00.8729143Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8733536Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8736152Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8737288Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8738303Z #22 265.4 ptxas info : Compile time = 87.306 ms 2025-09-07T06:30:00.8740922Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8745205Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8747830Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8749166Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8750109Z #22 265.4 ptxas info : Compile time = 80.385 ms 2025-09-07T06:30:00.8752694Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8757032Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8759629Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8760782Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8762011Z #22 265.4 ptxas info : Compile time = 72.635 ms 2025-09-07T06:30:00.8764643Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8768827Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8771632Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8772783Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8773771Z #22 265.4 ptxas info : Compile time = 73.496 ms 2025-09-07T06:30:00.8776318Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8780510Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8783028Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8784033Z #22 265.4 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8785042Z #22 265.4 ptxas info : Compile time = 81.378 ms 2025-09-07T06:30:00.8787690Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8792116Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8794749Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8795896Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8796863Z #22 265.4 ptxas info : Compile time = 64.471 ms 2025-09-07T06:30:00.8799468Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8803726Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8806347Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8807511Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8808504Z #22 265.4 ptxas info : Compile time = 59.148 ms 2025-09-07T06:30:00.8811197Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8815449Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8818052Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8819194Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8820191Z #22 265.4 ptxas info : Compile time = 52.724 ms 2025-09-07T06:30:00.8822971Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8827267Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8829885Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8831033Z #22 265.4 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8832016Z #22 265.4 ptxas info : Compile time = 51.491 ms 2025-09-07T06:30:00.8834625Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8838933Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8841506Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8842425Z #22 265.4 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8843208Z #22 265.4 ptxas info : Compile time = 107.111 ms 2025-09-07T06:30:00.8845498Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8849987Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8852908Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8854048Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8855039Z #22 265.4 ptxas info : Compile time = 88.202 ms 2025-09-07T06:30:00.8857681Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8861890Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8864444Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8865593Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8866595Z #22 265.4 ptxas info : Compile time = 80.787 ms 2025-09-07T06:30:00.8869198Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8873375Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8875987Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8877132Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8878141Z #22 265.4 ptxas info : Compile time = 75.370 ms 2025-09-07T06:30:00.8880734Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:00.8885261Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8887876Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8889021Z #22 265.4 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:30:00.8890023Z #22 265.4 ptxas info : Compile time = 73.282 ms 2025-09-07T06:30:00.8890748Z #22 265.4 ptxas info : 11 bytes gmem 2025-09-07T06:30:00.8893567Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.8898200Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8901022Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8902029Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.8902892Z #22 265.4 ptxas info : Compile time = 85.040 ms 2025-09-07T06:30:00.8905631Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.8910215Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8912999Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8914181Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.8915027Z #22 265.4 ptxas info : Compile time = 70.221 ms 2025-09-07T06:30:00.8917842Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.8922458Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8925236Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8926292Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.8927166Z #22 265.4 ptxas info : Compile time = 66.219 ms 2025-09-07T06:30:00.8929974Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.8934676Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8937380Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8938435Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.8939294Z #22 265.4 ptxas info : Compile time = 65.434 ms 2025-09-07T06:30:00.8942095Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.8946680Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8950034Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8951067Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.8951942Z #22 265.4 ptxas info : Compile time = 103.677 ms 2025-09-07T06:30:00.8954704Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.8959277Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8961944Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8962977Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.8963850Z #22 265.4 ptxas info : Compile time = 90.079 ms 2025-09-07T06:30:00.8966604Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.8971009Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8973452Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8974368Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.8975391Z #22 265.4 ptxas info : Compile time = 100.042 ms 2025-09-07T06:30:00.8977850Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.8981972Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8984385Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8985257Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.8985986Z #22 265.4 ptxas info : Compile time = 94.903 ms 2025-09-07T06:30:00.8988358Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.8992245Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.8994618Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.8995498Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.8996239Z #22 265.4 ptxas info : Compile time = 85.027 ms 2025-09-07T06:30:00.8998573Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9002598Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9005179Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9006267Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9007009Z #22 265.4 ptxas info : Compile time = 78.459 ms 2025-09-07T06:30:00.9009286Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9013143Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9015403Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9016245Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9016943Z #22 265.4 ptxas info : Compile time = 73.696 ms 2025-09-07T06:30:00.9019121Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9022730Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9025127Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9026003Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.9026717Z #22 265.4 ptxas info : Compile time = 69.849 ms 2025-09-07T06:30:00.9029065Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9033300Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9036448Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9037327Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9038038Z #22 265.4 ptxas info : Compile time = 118.060 ms 2025-09-07T06:30:00.9040859Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9046308Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9048574Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9050215Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9050978Z #22 265.4 ptxas info : Compile time = 105.236 ms 2025-09-07T06:30:00.9053160Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9056740Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9058904Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9059760Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9060444Z #22 265.4 ptxas info : Compile time = 93.367 ms 2025-09-07T06:30:00.9063027Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9066940Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9069189Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9070047Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9070727Z #22 265.4 ptxas info : Compile time = 92.508 ms 2025-09-07T06:30:00.9072861Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9076466Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9078623Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9079435Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9080119Z #22 265.4 ptxas info : Compile time = 103.179 ms 2025-09-07T06:30:00.9082275Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9086135Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9088338Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9089142Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9089838Z #22 265.4 ptxas info : Compile time = 79.314 ms 2025-09-07T06:30:00.9092164Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9095720Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9097886Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9098693Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9099389Z #22 265.4 ptxas info : Compile time = 67.142 ms 2025-09-07T06:30:00.9101523Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9105112Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9107267Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9108071Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9108817Z #22 265.4 ptxas info : Compile time = 61.560 ms 2025-09-07T06:30:00.9111152Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9114741Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9116910Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9117834Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.9118583Z #22 265.4 ptxas info : Compile time = 64.375 ms 2025-09-07T06:30:00.9120920Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9124871Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9127250Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9128138Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9128881Z #22 265.4 ptxas info : Compile time = 131.597 ms 2025-09-07T06:30:00.9131327Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9135162Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9137738Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9138618Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9139336Z #22 265.4 ptxas info : Compile time = 105.302 ms 2025-09-07T06:30:00.9141709Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9145559Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9147886Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9149093Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9149828Z #22 265.4 ptxas info : Compile time = 99.602 ms 2025-09-07T06:30:00.9152174Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9156061Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9158420Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9159305Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9160047Z #22 265.4 ptxas info : Compile time = 92.937 ms 2025-09-07T06:30:00.9162379Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9166629Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9168961Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9169855Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9170565Z #22 265.4 ptxas info : Compile time = 85.879 ms 2025-09-07T06:30:00.9173026Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9176880Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9179229Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9180086Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9180807Z #22 265.4 ptxas info : Compile time = 100.458 ms 2025-09-07T06:30:00.9183296Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9187422Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9189882Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9191067Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9191827Z #22 265.4 ptxas info : Compile time = 77.762 ms 2025-09-07T06:30:00.9194305Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9198264Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9200617Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9201482Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9202208Z #22 265.4 ptxas info : Compile time = 68.769 ms 2025-09-07T06:30:00.9204427Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9208080Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9210304Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9211264Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9211962Z #22 265.4 ptxas info : Compile time = 64.498 ms 2025-09-07T06:30:00.9214124Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9217903Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9220117Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9220953Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.9243274Z #22 265.4 ptxas info : Compile time = 63.784 ms 2025-09-07T06:30:00.9245567Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9249417Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9251716Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9252579Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9253348Z #22 265.4 ptxas info : Compile time = 128.994 ms 2025-09-07T06:30:00.9255569Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9259084Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9261279Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9262110Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9263089Z #22 265.4 ptxas info : Compile time = 100.086 ms 2025-09-07T06:30:00.9265287Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9268881Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9271084Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9271906Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9272601Z #22 265.4 ptxas info : Compile time = 93.101 ms 2025-09-07T06:30:00.9274824Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9278462Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9280655Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9281476Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9282141Z #22 265.4 ptxas info : Compile time = 89.509 ms 2025-09-07T06:30:00.9284240Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9288287Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9290431Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9291570Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9292210Z #22 265.4 ptxas info : Compile time = 86.821 ms 2025-09-07T06:30:00.9294161Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9297312Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9299218Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9299972Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9300590Z #22 265.4 ptxas info : Compile time = 81.187 ms 2025-09-07T06:30:00.9302536Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9305728Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9307662Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9308430Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9309036Z #22 265.4 ptxas info : Compile time = 73.388 ms 2025-09-07T06:30:00.9310917Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9314189Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9316063Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9316795Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9317402Z #22 265.4 ptxas info : Compile time = 66.874 ms 2025-09-07T06:30:00.9319297Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9322443Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9324388Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9325249Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.9325940Z #22 265.4 ptxas info : Compile time = 65.826 ms 2025-09-07T06:30:00.9328239Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9332435Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9334783Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9335674Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9336448Z #22 265.4 ptxas info : Compile time = 108.427 ms 2025-09-07T06:30:00.9338955Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9342901Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9345291Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9346257Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9347018Z #22 265.4 ptxas info : Compile time = 100.135 ms 2025-09-07T06:30:00.9349723Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9353779Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9356201Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9357093Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9357794Z #22 265.4 ptxas info : Compile time = 92.228 ms 2025-09-07T06:30:00.9360037Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9363930Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9366936Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9367873Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9368567Z #22 265.4 ptxas info : Compile time = 86.388 ms 2025-09-07T06:30:00.9370373Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9373658Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9375567Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9376305Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9376927Z #22 265.4 ptxas info : Compile time = 79.843 ms 2025-09-07T06:30:00.9378838Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9381979Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9383880Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9384583Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9385190Z #22 265.4 ptxas info : Compile time = 72.269 ms 2025-09-07T06:30:00.9387087Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9390392Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9392274Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9392986Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9393585Z #22 265.4 ptxas info : Compile time = 67.022 ms 2025-09-07T06:30:00.9395473Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9398603Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9400631Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9401497Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.9402124Z #22 265.4 ptxas info : Compile time = 65.708 ms 2025-09-07T06:30:00.9403923Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9407953Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9410681Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9411718Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9412476Z #22 265.4 ptxas info : Compile time = 109.934 ms 2025-09-07T06:30:00.9414968Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9419025Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9421526Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9422461Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9423256Z #22 265.4 ptxas info : Compile time = 100.136 ms 2025-09-07T06:30:00.9425860Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9430068Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9432407Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9433241Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9433852Z #22 265.4 ptxas info : Compile time = 90.999 ms 2025-09-07T06:30:00.9436020Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9439803Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9441891Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9442688Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9443338Z #22 265.4 ptxas info : Compile time = 86.922 ms 2025-09-07T06:30:00.9445489Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9449024Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9451139Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9451951Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9452615Z #22 265.4 ptxas info : Compile time = 100.054 ms 2025-09-07T06:30:00.9454652Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9458228Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9460275Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9461067Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9461986Z #22 265.4 ptxas info : Compile time = 76.339 ms 2025-09-07T06:30:00.9463997Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9467336Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9469339Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9470116Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9470774Z #22 265.4 ptxas info : Compile time = 69.119 ms 2025-09-07T06:30:00.9472776Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9476203Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9478299Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9479087Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9479790Z #22 265.4 ptxas info : Compile time = 62.761 ms 2025-09-07T06:30:00.9481771Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9485345Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9487561Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9488658Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.9489408Z #22 265.4 ptxas info : Compile time = 61.053 ms 2025-09-07T06:30:00.9491770Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9495452Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9497692Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9498552Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9499304Z #22 265.4 ptxas info : Compile time = 129.786 ms 2025-09-07T06:30:00.9501540Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9505294Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9507544Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9508424Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9509175Z #22 265.4 ptxas info : Compile time = 107.061 ms 2025-09-07T06:30:00.9511418Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9515337Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9517588Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9518451Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9519192Z #22 265.4 ptxas info : Compile time = 96.794 ms 2025-09-07T06:30:00.9521467Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9525219Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9527561Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9528448Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9529203Z #22 265.4 ptxas info : Compile time = 91.815 ms 2025-09-07T06:30:00.9531642Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9535382Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9537690Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9538555Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9539255Z #22 265.4 ptxas info : Compile time = 85.136 ms 2025-09-07T06:30:00.9541694Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9545504Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9547859Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9549058Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9549812Z #22 265.4 ptxas info : Compile time = 102.346 ms 2025-09-07T06:30:00.9552106Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9555902Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9558108Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9558879Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9559520Z #22 265.4 ptxas info : Compile time = 80.553 ms 2025-09-07T06:30:00.9561542Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9566302Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9569498Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9570591Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9571611Z #22 265.4 ptxas info : Compile time = 68.340 ms 2025-09-07T06:30:00.9574478Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9579218Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9582092Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9583211Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9584117Z #22 265.4 ptxas info : Compile time = 66.798 ms 2025-09-07T06:30:00.9587000Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9591743Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9594536Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9595376Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.9596133Z #22 265.4 ptxas info : Compile time = 66.083 ms 2025-09-07T06:30:00.9598366Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9602502Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9605065Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9606053Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9606893Z #22 265.4 ptxas info : Compile time = 131.213 ms 2025-09-07T06:30:00.9609275Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9613547Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9616095Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9617019Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9617780Z #22 265.4 ptxas info : Compile time = 106.301 ms 2025-09-07T06:30:00.9620260Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9624362Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9626912Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9628059Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9628871Z #22 265.4 ptxas info : Compile time = 92.199 ms 2025-09-07T06:30:00.9631332Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9634689Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9636763Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9637530Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9638188Z #22 265.4 ptxas info : Compile time = 90.096 ms 2025-09-07T06:30:00.9640186Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9643450Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9645486Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9646233Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9646928Z #22 265.4 ptxas info : Compile time = 87.678 ms 2025-09-07T06:30:00.9649164Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9652416Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9654656Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9655464Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9656105Z #22 265.4 ptxas info : Compile time = 84.940 ms 2025-09-07T06:30:00.9658068Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9661267Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9663238Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9664073Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9664736Z #22 265.4 ptxas info : Compile time = 75.268 ms 2025-09-07T06:30:00.9666714Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9669923Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9671877Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9672681Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9673345Z #22 265.4 ptxas info : Compile time = 66.905 ms 2025-09-07T06:30:00.9675333Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9678753Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9680732Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9681522Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.9682165Z #22 265.4 ptxas info : Compile time = 67.311 ms 2025-09-07T06:30:00.9684142Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9687365Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9689366Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9690172Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9690831Z #22 265.4 ptxas info : Compile time = 113.267 ms 2025-09-07T06:30:00.9693018Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9696316Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9698389Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9699273Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9699988Z #22 265.4 ptxas info : Compile time = 97.695 ms 2025-09-07T06:30:00.9703234Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9706914Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9709028Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9709856Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9710547Z #22 265.4 ptxas info : Compile time = 94.301 ms 2025-09-07T06:30:00.9712653Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9716110Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9718289Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9719151Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9719847Z #22 265.4 ptxas info : Compile time = 87.648 ms 2025-09-07T06:30:00.9721925Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9725427Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9727674Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9728525Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9729225Z #22 265.4 ptxas info : Compile time = 84.205 ms 2025-09-07T06:30:00.9731515Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9735092Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9737226Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9738085Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9738809Z #22 265.4 ptxas info : Compile time = 75.118 ms 2025-09-07T06:30:00.9740931Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9744479Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9746611Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9747434Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9748130Z #22 265.4 ptxas info : Compile time = 67.639 ms 2025-09-07T06:30:00.9750557Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9754515Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9756793Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9757686Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.9758437Z #22 265.4 ptxas info : Compile time = 66.000 ms 2025-09-07T06:30:00.9760719Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9764753Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9767152Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9768043Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9768813Z #22 265.4 ptxas info : Compile time = 111.905 ms 2025-09-07T06:30:00.9771226Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9775092Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9777425Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9778375Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9779454Z #22 265.4 ptxas info : Compile time = 98.298 ms 2025-09-07T06:30:00.9781894Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9785741Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9788198Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9789140Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9789918Z #22 265.4 ptxas info : Compile time = 93.209 ms 2025-09-07T06:30:00.9792262Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9796241Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9798573Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9799484Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9800320Z #22 265.4 ptxas info : Compile time = 90.209 ms 2025-09-07T06:30:00.9802691Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9806476Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9808966Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9809875Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9810872Z #22 265.4 ptxas info : Compile time = 103.241 ms 2025-09-07T06:30:00.9813445Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9817224Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9819583Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9820572Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9821393Z #22 265.4 ptxas info : Compile time = 81.584 ms 2025-09-07T06:30:00.9823761Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9827591Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9829991Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9830942Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9831759Z #22 265.4 ptxas info : Compile time = 68.873 ms 2025-09-07T06:30:00.9834015Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9838096Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9840465Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9841366Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9842128Z #22 265.4 ptxas info : Compile time = 64.400 ms 2025-09-07T06:30:00.9844431Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9848141Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9891822Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9892817Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.9893574Z #22 265.4 ptxas info : Compile time = 66.031 ms 2025-09-07T06:30:00.9895907Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9899617Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9901964Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9902871Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9903643Z #22 265.4 ptxas info : Compile time = 131.228 ms 2025-09-07T06:30:00.9906312Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9910157Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9912589Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9913545Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9914324Z #22 265.4 ptxas info : Compile time = 108.209 ms 2025-09-07T06:30:00.9916783Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9920717Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9923095Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9924065Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9924806Z #22 265.4 ptxas info : Compile time = 99.497 ms 2025-09-07T06:30:00.9927084Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9931060Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9933788Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9934116Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9934359Z #22 265.4 ptxas info : Compile time = 91.855 ms 2025-09-07T06:30:00.9936207Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9937907Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9938300Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9938580Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9938845Z #22 265.4 ptxas info : Compile time = 90.942 ms 2025-09-07T06:30:00.9940656Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9942378Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9942785Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9943077Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9943317Z #22 265.4 ptxas info : Compile time = 104.047 ms 2025-09-07T06:30:00.9945194Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9946877Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9947507Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9947822Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9948065Z #22 265.4 ptxas info : Compile time = 80.440 ms 2025-09-07T06:30:00.9950194Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9951919Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9952300Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9952621Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9952881Z #22 265.4 ptxas info : Compile time = 71.963 ms 2025-09-07T06:30:00.9954761Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9956464Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9956874Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9957180Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9957418Z #22 265.4 ptxas info : Compile time = 64.877 ms 2025-09-07T06:30:00.9959288Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9961248Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9961642Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9961968Z #22 265.4 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:30:00.9962204Z #22 265.4 ptxas info : Compile time = 66.562 ms 2025-09-07T06:30:00.9964033Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9965741Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9966151Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9966452Z #22 265.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:00.9966716Z #22 265.4 ptxas info : Compile time = 134.650 ms 2025-09-07T06:30:00.9968541Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9970254Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9970656Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9971107Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9971362Z #22 265.4 ptxas info : Compile time = 105.657 ms 2025-09-07T06:30:00.9973509Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9975176Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9975564Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9975892Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9976149Z #22 265.4 ptxas info : Compile time = 91.463 ms 2025-09-07T06:30:00.9977991Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9979773Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9980160Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9980462Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9980711Z #22 265.4 ptxas info : Compile time = 89.031 ms 2025-09-07T06:30:00.9982567Z #22 265.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:00.9984247Z #22 265.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:30:00.9984815Z #22 265.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:00.9985121Z #22 265.4 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:30:00.9985369Z #22 265.4 ptxas info : Compile time = 86.355 ms 2025-09-07T06:30:05.6238400Z #22 270.4 [4/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:05.7786887Z #22 270.4 ptxas info : 131 bytes gmem, 112 bytes cmem[4] 2025-09-07T06:30:05.7791935Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.7800326Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.7804809Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.7805811Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:05.7806637Z #22 270.4 ptxas info : Compile time = 1.677 ms 2025-09-07T06:30:05.7811077Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.7819550Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.7823759Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.7824801Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:05.7825708Z #22 270.4 ptxas info : Compile time = 0.786 ms 2025-09-07T06:30:05.7830218Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.7838467Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.7843151Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.7844100Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:05.7844900Z #22 270.4 ptxas info : Compile time = 0.522 ms 2025-09-07T06:30:05.7849855Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.7857968Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.7862371Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.7863313Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:05.7864139Z #22 270.4 ptxas info : Compile time = 0.478 ms 2025-09-07T06:30:05.7868454Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.7876741Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.7881516Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.7882430Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:05.7883267Z #22 270.4 ptxas info : Compile time = 0.482 ms 2025-09-07T06:30:05.7887446Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.7895660Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.7900365Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.7901346Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:05.7902253Z #22 270.4 ptxas info : Compile time = 0.467 ms 2025-09-07T06:30:05.7907229Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.7915356Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.7919788Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.7920710Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:05.7921497Z #22 270.4 ptxas info : Compile time = 0.469 ms 2025-09-07T06:30:05.7925967Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.7934003Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.7938609Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.7939604Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:05.7940405Z #22 270.4 ptxas info : Compile time = 0.514 ms 2025-09-07T06:30:05.7944645Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.7954850Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.7959012Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.7959845Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:05.7960781Z #22 270.4 ptxas info : Compile time = 0.522 ms 2025-09-07T06:30:05.7964976Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.7973647Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.7977763Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.7978726Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:05.7979503Z #22 270.4 ptxas info : Compile time = 0.473 ms 2025-09-07T06:30:05.7981678Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.7985240Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:05.7987397Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.7988347Z #22 270.4 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:05.7989139Z #22 270.4 ptxas info : Compile time = 29.911 ms 2025-09-07T06:30:05.7993524Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.8001855Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8006202Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8007173Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:05.8007998Z #22 270.4 ptxas info : Compile time = 0.844 ms 2025-09-07T06:30:05.8012370Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.8020294Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:05.8024381Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8025458Z #22 270.4 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:05.8026380Z #22 270.4 ptxas info : Compile time = 13.749 ms 2025-09-07T06:30:05.8030729Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.8038218Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:05.8042725Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8043731Z #22 270.4 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:05.8044643Z #22 270.4 ptxas info : Compile time = 11.427 ms 2025-09-07T06:30:05.8049631Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.8058162Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8062748Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8063789Z #22 270.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:05.8064652Z #22 270.4 ptxas info : Compile time = 0.752 ms 2025-09-07T06:30:05.8066779Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:05.8069781Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:05.8071649Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8072510Z #22 270.4 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:05.8073251Z #22 270.4 ptxas info : Compile time = 32.713 ms 2025-09-07T06:30:05.8073805Z #22 270.4 ptxas info : 11 bytes gmem 2025-09-07T06:30:05.8077448Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8084628Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8088489Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8089238Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:30:05.8089853Z #22 270.4 ptxas info : Compile time = 242.029 ms 2025-09-07T06:30:05.8093743Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8100592Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8104418Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8105176Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:30:05.8105781Z #22 270.4 ptxas info : Compile time = 240.829 ms 2025-09-07T06:30:05.8109405Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8116460Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8120258Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8121011Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:30:05.8121649Z #22 270.4 ptxas info : Compile time = 293.443 ms 2025-09-07T06:30:05.8125455Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8132674Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8136430Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8137179Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:30:05.8138052Z #22 270.4 ptxas info : Compile time = 267.375 ms 2025-09-07T06:30:05.8141885Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8148646Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8152723Z #22 270.4 48 bytes stack frame, 64 bytes spill stores, 116 bytes spill loads 2025-09-07T06:30:05.8153642Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:30:05.8154453Z #22 270.4 ptxas info : Compile time = 693.593 ms 2025-09-07T06:30:05.8158185Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8165143Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8168932Z #22 270.4 48 bytes stack frame, 64 bytes spill stores, 116 bytes spill loads 2025-09-07T06:30:05.8169871Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:30:05.8170656Z #22 270.4 ptxas info : Compile time = 679.583 ms 2025-09-07T06:30:05.8174591Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8181542Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8185416Z #22 270.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:30:05.8186342Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:05.8187179Z #22 270.4 ptxas info : Compile time = 734.774 ms 2025-09-07T06:30:05.8191128Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8197936Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8201733Z #22 270.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:30:05.8202669Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:05.8203460Z #22 270.4 ptxas info : Compile time = 703.472 ms 2025-09-07T06:30:05.8207121Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8213705Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8217667Z #22 270.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:30:05.8218594Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:30:05.8219422Z #22 270.4 ptxas info : Compile time = 459.712 ms 2025-09-07T06:30:05.8223064Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8229831Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8233568Z #22 270.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:30:05.8234498Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:30:05.8235302Z #22 270.4 ptxas info : Compile time = 450.227 ms 2025-09-07T06:30:05.8237071Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8240309Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:05.8242177Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8242921Z #22 270.4 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:30:05.8243525Z #22 270.4 ptxas info : Compile time = 21.771 ms 2025-09-07T06:30:05.8247332Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8254483Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8258171Z #22 270.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:30:05.8259078Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:05.8259880Z #22 270.4 ptxas info : Compile time = 489.774 ms 2025-09-07T06:30:05.8263434Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8270141Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:05.8273837Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8274584Z #22 270.4 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:30:05.8275231Z #22 270.4 ptxas info : Compile time = 16.184 ms 2025-09-07T06:30:05.8278850Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8285154Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:05.8288700Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8289452Z #22 270.4 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:30:05.8290317Z #22 270.4 ptxas info : Compile time = 16.174 ms 2025-09-07T06:30:05.8294262Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8300894Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:05.8304678Z #22 270.4 16 bytes stack frame, 20 bytes spill stores, 24 bytes spill loads 2025-09-07T06:30:05.8305583Z #22 270.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:05.8306386Z #22 270.4 ptxas info : Compile time = 460.954 ms 2025-09-07T06:30:05.8308248Z #22 270.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:05.8311280Z #22 270.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:05.8313382Z #22 270.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:05.8314110Z #22 270.4 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:30:05.8314747Z #22 270.4 ptxas info : Compile time = 24.360 ms 2025-09-07T06:30:10.6573796Z #22 275.4 [5/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:10.8112750Z #22 275.4 ptxas info : 131 bytes gmem, 112 bytes cmem[4] 2025-09-07T06:30:10.8117200Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8125179Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8129695Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8130673Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:10.8131651Z #22 275.4 ptxas info : Compile time = 1.687 ms 2025-09-07T06:30:10.8135714Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8143809Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8147920Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8149025Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:10.8149842Z #22 275.4 ptxas info : Compile time = 0.749 ms 2025-09-07T06:30:10.8154059Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8161969Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8166501Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8167445Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:10.8168309Z #22 275.4 ptxas info : Compile time = 0.511 ms 2025-09-07T06:30:10.8173179Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8181125Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8185616Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8186652Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:10.8187516Z #22 275.4 ptxas info : Compile time = 0.498 ms 2025-09-07T06:30:10.8192065Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8200140Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8205058Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8206057Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:10.8206924Z #22 275.4 ptxas info : Compile time = 0.476 ms 2025-09-07T06:30:10.8211677Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8219886Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8224469Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8225553Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:10.8226414Z #22 275.4 ptxas info : Compile time = 0.471 ms 2025-09-07T06:30:10.8230994Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8239391Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8244163Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8245215Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:10.8246141Z #22 275.4 ptxas info : Compile time = 0.494 ms 2025-09-07T06:30:10.8250741Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8259000Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8263570Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8264997Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:10.8265857Z #22 275.4 ptxas info : Compile time = 0.533 ms 2025-09-07T06:30:10.8270146Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8278054Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8282571Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8283644Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:10.8284532Z #22 275.4 ptxas info : Compile time = 0.542 ms 2025-09-07T06:30:10.8289003Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8297433Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8302060Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8303120Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:10.8303992Z #22 275.4 ptxas info : Compile time = 0.601 ms 2025-09-07T06:30:10.8306115Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8309716Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:10.8311959Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8312925Z #22 275.4 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:10.8313759Z #22 275.4 ptxas info : Compile time = 31.058 ms 2025-09-07T06:30:10.8318274Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8326601Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8331453Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8332491Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:10.8333361Z #22 275.4 ptxas info : Compile time = 0.841 ms 2025-09-07T06:30:10.8337632Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8345677Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:10.8350111Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8351008Z #22 275.4 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:10.8351801Z #22 275.4 ptxas info : Compile time = 13.551 ms 2025-09-07T06:30:10.8355960Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8363486Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:10.8367678Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8368484Z #22 275.4 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:10.8369164Z #22 275.4 ptxas info : Compile time = 14.063 ms 2025-09-07T06:30:10.8372859Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8380493Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8384018Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8386627Z #22 275.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:10.8387326Z #22 275.4 ptxas info : Compile time = 0.862 ms 2025-09-07T06:30:10.8389023Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:10.8391813Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:10.8393525Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8394307Z #22 275.4 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:10.8394991Z #22 275.4 ptxas info : Compile time = 34.874 ms 2025-09-07T06:30:10.8395519Z #22 275.4 ptxas info : 11 bytes gmem 2025-09-07T06:30:10.8398993Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8405371Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8408892Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8409603Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:30:10.8410412Z #22 275.4 ptxas info : Compile time = 368.043 ms 2025-09-07T06:30:10.8414061Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8420381Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8423926Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8424626Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:30:10.8425238Z #22 275.4 ptxas info : Compile time = 371.695 ms 2025-09-07T06:30:10.8428749Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8435338Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8438830Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8439539Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:30:10.8440126Z #22 275.4 ptxas info : Compile time = 465.265 ms 2025-09-07T06:30:10.8443611Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8450284Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8453884Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8454584Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:30:10.8455189Z #22 275.4 ptxas info : Compile time = 416.224 ms 2025-09-07T06:30:10.8459005Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8465367Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8468889Z #22 275.4 48 bytes stack frame, 64 bytes spill stores, 116 bytes spill loads 2025-09-07T06:30:10.8469790Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:30:10.8470549Z #22 275.4 ptxas info : Compile time = 1066.812 ms 2025-09-07T06:30:10.8474078Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8480503Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8485416Z #22 275.4 48 bytes stack frame, 64 bytes spill stores, 116 bytes spill loads 2025-09-07T06:30:10.8486293Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:30:10.8487084Z #22 275.4 ptxas info : Compile time = 1056.837 ms 2025-09-07T06:30:10.8490583Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8497123Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8500669Z #22 275.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:30:10.8501554Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:10.8502308Z #22 275.4 ptxas info : Compile time = 1154.766 ms 2025-09-07T06:30:10.8505810Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8512376Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8515871Z #22 275.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:30:10.8516748Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:10.8517520Z #22 275.4 ptxas info : Compile time = 1098.611 ms 2025-09-07T06:30:10.8520931Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8527068Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8530466Z #22 275.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:30:10.8531695Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:30:10.8532464Z #22 275.4 ptxas info : Compile time = 694.574 ms 2025-09-07T06:30:10.8535843Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8541935Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8545326Z #22 275.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:30:10.8546225Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:30:10.8546978Z #22 275.4 ptxas info : Compile time = 699.843 ms 2025-09-07T06:30:10.8548688Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8553736Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:10.8555458Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8556191Z #22 275.4 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:30:10.8556792Z #22 275.4 ptxas info : Compile time = 37.390 ms 2025-09-07T06:30:10.8560574Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8566963Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8570499Z #22 275.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:30:10.8571525Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:10.8572296Z #22 275.4 ptxas info : Compile time = 767.809 ms 2025-09-07T06:30:10.8575616Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8581563Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:10.8585150Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8585863Z #22 275.4 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:30:10.8586447Z #22 275.4 ptxas info : Compile time = 25.475 ms 2025-09-07T06:30:10.8589735Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8595712Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:10.8599032Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8599731Z #22 275.4 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:30:10.8600403Z #22 275.4 ptxas info : Compile time = 24.741 ms 2025-09-07T06:30:10.8604231Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8610588Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:10.8614293Z #22 275.4 16 bytes stack frame, 20 bytes spill stores, 24 bytes spill loads 2025-09-07T06:30:10.8615156Z #22 275.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:10.8615950Z #22 275.4 ptxas info : Compile time = 709.784 ms 2025-09-07T06:30:10.8617671Z #22 275.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:10.8620450Z #22 275.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:10.8622175Z #22 275.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:10.8622876Z #22 275.4 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:30:10.8623476Z #22 275.4 ptxas info : Compile time = 41.307 ms 2025-09-07T06:30:34.4189206Z #22 299.2 [6/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:34.4203108Z #22 299.2 ptxas info : 131 bytes gmem, 112 bytes cmem[4] 2025-09-07T06:30:34.4207017Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4213728Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4217305Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4218094Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:34.4218789Z #22 299.2 ptxas info : Compile time = 1.723 ms 2025-09-07T06:30:34.4222360Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4228835Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4232687Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4233476Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:34.4234163Z #22 299.2 ptxas info : Compile time = 0.784 ms 2025-09-07T06:30:34.4237741Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4244245Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4247835Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4248634Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:34.4249599Z #22 299.2 ptxas info : Compile time = 0.555 ms 2025-09-07T06:30:34.4253224Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4259966Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4263575Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4264353Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:34.4265043Z #22 299.2 ptxas info : Compile time = 0.484 ms 2025-09-07T06:30:34.4268748Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4276484Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4280923Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4281925Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:34.4283139Z #22 299.2 ptxas info : Compile time = 0.471 ms 2025-09-07T06:30:34.4287395Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4295908Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4300390Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4301406Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:34.4302257Z #22 299.2 ptxas info : Compile time = 0.468 ms 2025-09-07T06:30:34.4306643Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4315166Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4319834Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4320884Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:34.4321721Z #22 299.2 ptxas info : Compile time = 0.465 ms 2025-09-07T06:30:34.4326505Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4335021Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4339708Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4340690Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:34.4341546Z #22 299.2 ptxas info : Compile time = 0.461 ms 2025-09-07T06:30:34.4346009Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4354514Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4359016Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4360014Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:34.4360947Z #22 299.2 ptxas info : Compile time = 0.507 ms 2025-09-07T06:30:34.4365544Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4373691Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4378342Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4379398Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:34.4380681Z #22 299.2 ptxas info : Compile time = 0.486 ms 2025-09-07T06:30:34.4382965Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4386690Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:34.4388976Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4390026Z #22 299.2 ptxas info : Used 105 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:34.4390939Z #22 299.2 ptxas info : Compile time = 94.304 ms 2025-09-07T06:30:34.4395698Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4403861Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4408779Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4409809Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:34.4410647Z #22 299.2 ptxas info : Compile time = 1.025 ms 2025-09-07T06:30:34.4414588Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4422032Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:34.4426421Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4427243Z #22 299.2 ptxas info : Used 27 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:34.4427919Z #22 299.2 ptxas info : Compile time = 20.787 ms 2025-09-07T06:30:34.4431245Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4437526Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:34.4440870Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4441644Z #22 299.2 ptxas info : Used 29 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:34.4442332Z #22 299.2 ptxas info : Compile time = 19.743 ms 2025-09-07T06:30:34.4445887Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4453023Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4456609Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4457407Z #22 299.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:34.4458171Z #22 299.2 ptxas info : Compile time = 0.778 ms 2025-09-07T06:30:34.4460318Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4463420Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:34.4465286Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4466210Z #22 299.2 ptxas info : Used 103 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:34.4466942Z #22 299.2 ptxas info : Compile time = 55.601 ms 2025-09-07T06:30:34.4467719Z #22 299.2 ptxas info : 11 bytes gmem 2025-09-07T06:30:34.4471398Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4477977Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4481863Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4559377Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:34.4560458Z #22 299.2 ptxas info : Compile time = 224.477 ms 2025-09-07T06:30:34.4564368Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4571023Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4574641Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4575352Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:34.4575989Z #22 299.2 ptxas info : Compile time = 227.841 ms 2025-09-07T06:30:34.4579558Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4586125Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4590025Z #22 299.2 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:30:34.4590893Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers, 8 bytes cumulative stack size 2025-09-07T06:30:34.4591644Z #22 299.2 ptxas info : Compile time = 321.146 ms 2025-09-07T06:30:34.4595201Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4601720Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4605308Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4606015Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:34.4606626Z #22 299.2 ptxas info : Compile time = 254.998 ms 2025-09-07T06:30:34.4611863Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4618448Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4622028Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4622760Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:34.4623352Z #22 299.2 ptxas info : Compile time = 277.320 ms 2025-09-07T06:30:34.4626980Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4633486Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4637272Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4637983Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:34.4638606Z #22 299.2 ptxas info : Compile time = 248.744 ms 2025-09-07T06:30:34.4642151Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4648643Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4652628Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4653329Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:34.4653930Z #22 299.2 ptxas info : Compile time = 341.074 ms 2025-09-07T06:30:34.4657497Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4664257Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4667820Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4668532Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:34.4669116Z #22 299.2 ptxas info : Compile time = 273.164 ms 2025-09-07T06:30:34.4672567Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4678856Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4682345Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4683153Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:34.4683847Z #22 299.2 ptxas info : Compile time = 272.690 ms 2025-09-07T06:30:34.4687307Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4694044Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4697501Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4698234Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:34.4698822Z #22 299.2 ptxas info : Compile time = 247.196 ms 2025-09-07T06:30:34.4700551Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4703332Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:34.4705055Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4705778Z #22 299.2 ptxas info : Used 106 registers, used 0 barriers 2025-09-07T06:30:34.4706369Z #22 299.2 ptxas info : Compile time = 36.324 ms 2025-09-07T06:30:34.4710156Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4716703Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4720305Z #22 299.2 16 bytes stack frame, 20 bytes spill stores, 20 bytes spill loads 2025-09-07T06:30:34.4721170Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:34.4721972Z #22 299.2 ptxas info : Compile time = 319.298 ms 2025-09-07T06:30:34.4725275Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4731439Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:34.5683348Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.5684437Z #22 299.2 ptxas info : Used 109 registers, used 1 barriers 2025-09-07T06:30:34.5685270Z #22 299.2 ptxas info : Compile time = 27.611 ms 2025-09-07T06:30:34.5690099Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.5699048Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:34.5703771Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.5704713Z #22 299.2 ptxas info : Used 89 registers, used 1 barriers 2025-09-07T06:30:34.5705551Z #22 299.2 ptxas info : Compile time = 23.118 ms 2025-09-07T06:30:34.5710637Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.5720176Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.5725289Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.5726243Z #22 299.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:34.5727070Z #22 299.2 ptxas info : Compile time = 251.616 ms 2025-09-07T06:30:34.5729487Z #22 299.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.5733593Z #22 299.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:34.5736204Z #22 299.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.5737224Z #22 299.2 ptxas info : Used 96 registers, used 0 barriers 2025-09-07T06:30:34.5738068Z #22 299.2 ptxas info : Compile time = 41.788 ms 2025-09-07T06:30:37.4629888Z #22 302.2 [7/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:37.4647427Z #22 302.2 ptxas info : 131 bytes gmem, 112 bytes cmem[4] 2025-09-07T06:30:37.4652487Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4661580Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4666321Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4667224Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:37.4668032Z #22 302.2 ptxas info : Compile time = 1.682 ms 2025-09-07T06:30:37.4672651Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4680993Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4685680Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4686649Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:37.4687895Z #22 302.2 ptxas info : Compile time = 0.759 ms 2025-09-07T06:30:37.4692560Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4701186Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4705841Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4706795Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:37.4707574Z #22 302.2 ptxas info : Compile time = 0.517 ms 2025-09-07T06:30:37.4711800Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4720542Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4725088Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4726067Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:37.4726914Z #22 302.2 ptxas info : Compile time = 0.506 ms 2025-09-07T06:30:37.4731738Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4739862Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4744373Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4745371Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:37.4746198Z #22 302.2 ptxas info : Compile time = 0.479 ms 2025-09-07T06:30:37.4755139Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4763128Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4767634Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4768604Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:37.4769471Z #22 302.2 ptxas info : Compile time = 0.469 ms 2025-09-07T06:30:37.4774125Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4782558Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4787390Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4788723Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:37.4789625Z #22 302.2 ptxas info : Compile time = 0.467 ms 2025-09-07T06:30:37.4794590Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4802538Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4806996Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4807965Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:37.4808809Z #22 302.2 ptxas info : Compile time = 0.464 ms 2025-09-07T06:30:37.4813345Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4821609Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4826088Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4827057Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:37.4827900Z #22 302.2 ptxas info : Compile time = 0.523 ms 2025-09-07T06:30:37.4832323Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4840594Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4844915Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4845723Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:37.4846450Z #22 302.2 ptxas info : Compile time = 0.478 ms 2025-09-07T06:30:37.4848691Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4852221Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:37.4854550Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4855572Z #22 302.2 ptxas info : Used 119 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:37.4856398Z #22 302.2 ptxas info : Compile time = 103.892 ms 2025-09-07T06:30:37.4860889Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4869661Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4874218Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4875099Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:37.4876388Z #22 302.2 ptxas info : Compile time = 0.938 ms 2025-09-07T06:30:37.4880908Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4889024Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:37.4893701Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4894820Z #22 302.2 ptxas info : Used 27 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:37.4895756Z #22 302.2 ptxas info : Compile time = 40.264 ms 2025-09-07T06:30:37.4900299Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4909029Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:37.4913937Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4915022Z #22 302.2 ptxas info : Used 29 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:37.4915956Z #22 302.2 ptxas info : Compile time = 36.434 ms 2025-09-07T06:30:37.4921119Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4929834Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4934087Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4934961Z #22 302.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:37.4935690Z #22 302.2 ptxas info : Compile time = 0.794 ms 2025-09-07T06:30:37.4937624Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:37.4941121Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:37.4943117Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4943987Z #22 302.2 ptxas info : Used 121 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:37.4944725Z #22 302.2 ptxas info : Compile time = 58.819 ms 2025-09-07T06:30:37.4945302Z #22 302.2 ptxas info : 11 bytes gmem 2025-09-07T06:30:37.4949589Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.4957146Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4961527Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4962402Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:37.4963107Z #22 302.2 ptxas info : Compile time = 225.964 ms 2025-09-07T06:30:37.4967851Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.4976050Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4979980Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.4980757Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:37.4981395Z #22 302.2 ptxas info : Compile time = 228.711 ms 2025-09-07T06:30:37.4985698Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.4993701Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.4998558Z #22 302.2 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:30:37.4999646Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers, 8 bytes cumulative stack size 2025-09-07T06:30:37.5000553Z #22 302.2 ptxas info : Compile time = 322.388 ms 2025-09-07T06:30:37.5005113Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5013637Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.5018209Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5019087Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:37.5019922Z #22 302.2 ptxas info : Compile time = 256.461 ms 2025-09-07T06:30:37.5023900Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5031218Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.5035273Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5036145Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:37.5036841Z #22 302.2 ptxas info : Compile time = 280.006 ms 2025-09-07T06:30:37.5041315Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5049727Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.5054220Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5055445Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:37.5056182Z #22 302.2 ptxas info : Compile time = 251.132 ms 2025-09-07T06:30:37.5060571Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5068581Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.5073057Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5073920Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:37.5074671Z #22 302.2 ptxas info : Compile time = 342.812 ms 2025-09-07T06:30:37.5079110Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5089029Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.5093668Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5094552Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:37.5095264Z #22 302.2 ptxas info : Compile time = 276.763 ms 2025-09-07T06:30:37.5099663Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5107640Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.5112112Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5113038Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:37.5113708Z #22 302.2 ptxas info : Compile time = 276.493 ms 2025-09-07T06:30:37.5117378Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5124341Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.5128109Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5128878Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:37.5129515Z #22 302.2 ptxas info : Compile time = 248.046 ms 2025-09-07T06:30:37.5131572Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5134618Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:37.5136484Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5137232Z #22 302.2 ptxas info : Used 124 registers, used 0 barriers 2025-09-07T06:30:37.5137854Z #22 302.2 ptxas info : Compile time = 35.362 ms 2025-09-07T06:30:37.5142007Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5150856Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.5155520Z #22 302.2 16 bytes stack frame, 20 bytes spill stores, 20 bytes spill loads 2025-09-07T06:30:37.5156649Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:37.5157738Z #22 302.2 ptxas info : Compile time = 320.596 ms 2025-09-07T06:30:37.5162497Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5170601Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:37.5175520Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5176425Z #22 302.2 ptxas info : Used 109 registers, used 1 barriers 2025-09-07T06:30:37.5177154Z #22 302.2 ptxas info : Compile time = 27.825 ms 2025-09-07T06:30:37.5181027Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5188089Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:37.5192471Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5193361Z #22 302.2 ptxas info : Used 89 registers, used 1 barriers 2025-09-07T06:30:37.5194114Z #22 302.2 ptxas info : Compile time = 23.137 ms 2025-09-07T06:30:37.5198795Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5207891Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:37.5212749Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5213637Z #22 302.2 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:30:37.5214396Z #22 302.2 ptxas info : Compile time = 253.853 ms 2025-09-07T06:30:37.5216701Z #22 302.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:37.5220505Z #22 302.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:37.5222788Z #22 302.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:37.5223724Z #22 302.2 ptxas info : Used 122 registers, used 0 barriers 2025-09-07T06:30:37.5224486Z #22 302.2 ptxas info : Compile time = 41.468 ms 2025-09-07T06:30:51.4458571Z #22 316.2 [8/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:51.5957304Z #22 316.2 ptxas info : 131 bytes gmem, 112 bytes cmem[4] 2025-09-07T06:30:51.5962262Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.5971664Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.5976407Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.5977249Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:51.5977968Z #22 316.2 ptxas info : Compile time = 1.956 ms 2025-09-07T06:30:51.5981843Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.5989080Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.5993045Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.5994251Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:51.5994982Z #22 316.2 ptxas info : Compile time = 0.968 ms 2025-09-07T06:30:51.5998832Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6005844Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6009745Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6010615Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:51.6011546Z #22 316.2 ptxas info : Compile time = 0.647 ms 2025-09-07T06:30:51.6015386Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6022655Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6026638Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6027475Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:51.6028181Z #22 316.2 ptxas info : Compile time = 20.684 ms 2025-09-07T06:30:51.6032074Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6039186Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6043147Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6044021Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:51.6044757Z #22 316.2 ptxas info : Compile time = 0.799 ms 2025-09-07T06:30:51.6049073Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6056531Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6060400Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6061243Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:51.6061983Z #22 316.2 ptxas info : Compile time = 0.573 ms 2025-09-07T06:30:51.6065894Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6072963Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6077155Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6078008Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:51.6078714Z #22 316.2 ptxas info : Compile time = 0.507 ms 2025-09-07T06:30:51.6082514Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6089506Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6093589Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6094406Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:51.6095121Z #22 316.2 ptxas info : Compile time = 0.482 ms 2025-09-07T06:30:51.6098795Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6105786Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6109515Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6110367Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:51.6111087Z #22 316.2 ptxas info : Compile time = 0.543 ms 2025-09-07T06:30:51.6114893Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6121784Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6125575Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6126449Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:51.6127173Z #22 316.2 ptxas info : Compile time = 0.511 ms 2025-09-07T06:30:51.6129323Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6132592Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:51.6134474Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6135294Z #22 316.2 ptxas info : Used 84 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:51.6136018Z #22 316.2 ptxas info : Compile time = 51.485 ms 2025-09-07T06:30:51.6139844Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6146844Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6151001Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6152136Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:51.6152832Z #22 316.2 ptxas info : Compile time = 0.963 ms 2025-09-07T06:30:51.6156369Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6162791Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:51.6166354Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6167193Z #22 316.2 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:51.6167919Z #22 316.2 ptxas info : Compile time = 34.924 ms 2025-09-07T06:30:51.6171705Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6178515Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:51.6182143Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6182969Z #22 316.2 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:51.6183682Z #22 316.2 ptxas info : Compile time = 32.870 ms 2025-09-07T06:30:51.6187507Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6194639Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6198623Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6199576Z #22 316.2 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:51.6200405Z #22 316.2 ptxas info : Compile time = 0.758 ms 2025-09-07T06:30:51.6202539Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:51.6206375Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:51.6208486Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6209437Z #22 316.2 ptxas info : Used 87 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:51.6210248Z #22 316.2 ptxas info : Compile time = 73.536 ms 2025-09-07T06:30:51.6211017Z #22 316.2 ptxas info : 11 bytes gmem 2025-09-07T06:30:51.6215371Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6223526Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6228029Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6228934Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:51.6229660Z #22 316.2 ptxas info : Compile time = 243.046 ms 2025-09-07T06:30:51.6234382Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6242389Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6246852Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6247722Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:51.6248461Z #22 316.2 ptxas info : Compile time = 253.119 ms 2025-09-07T06:30:51.6253554Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6261740Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6266574Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6267440Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:51.6268163Z #22 316.2 ptxas info : Compile time = 337.125 ms 2025-09-07T06:30:51.6272719Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6280870Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6285335Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6286207Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:51.6286911Z #22 316.2 ptxas info : Compile time = 284.313 ms 2025-09-07T06:30:51.6291551Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6300010Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6304434Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6305270Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:51.6306010Z #22 316.2 ptxas info : Compile time = 298.128 ms 2025-09-07T06:30:51.6310472Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6318592Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6323034Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6324129Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:51.6324832Z #22 316.2 ptxas info : Compile time = 281.384 ms 2025-09-07T06:30:51.6329304Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6337583Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6342120Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6342970Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:51.6343704Z #22 316.2 ptxas info : Compile time = 387.466 ms 2025-09-07T06:30:51.6348129Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6356772Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6361217Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6362085Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:51.6362790Z #22 316.2 ptxas info : Compile time = 322.128 ms 2025-09-07T06:30:51.6367159Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6375213Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6379544Z #22 316.2 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:30:51.6380618Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:30:51.6381561Z #22 316.2 ptxas info : Compile time = 306.274 ms 2025-09-07T06:30:51.6386165Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6394009Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6398396Z #22 316.2 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:30:51.6399430Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:30:51.6400371Z #22 316.2 ptxas info : Compile time = 287.301 ms 2025-09-07T06:30:51.6402541Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6405997Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:51.6408154Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6409026Z #22 316.2 ptxas info : Used 90 registers, used 0 barriers 2025-09-07T06:30:51.6409750Z #22 316.2 ptxas info : Compile time = 38.444 ms 2025-09-07T06:30:51.6414570Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6422774Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6427260Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6428158Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:51.6428907Z #22 316.2 ptxas info : Compile time = 354.661 ms 2025-09-07T06:30:51.6433108Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6440591Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:51.6445073Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6445951Z #22 316.2 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:30:51.6446661Z #22 316.2 ptxas info : Compile time = 32.653 ms 2025-09-07T06:30:51.6451287Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6458802Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:51.6463001Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6463877Z #22 316.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:51.6464580Z #22 316.2 ptxas info : Compile time = 27.065 ms 2025-09-07T06:30:51.6469047Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6477589Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:51.6482073Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6482913Z #22 316.2 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:51.6483644Z #22 316.2 ptxas info : Compile time = 300.995 ms 2025-09-07T06:30:51.6485752Z #22 316.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:51.6489315Z #22 316.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:51.6491668Z #22 316.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:51.6492505Z #22 316.2 ptxas info : Used 76 registers, used 0 barriers 2025-09-07T06:30:51.6493227Z #22 316.2 ptxas info : Compile time = 44.691 ms 2025-09-07T06:30:59.6562039Z #22 324.4 [9/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:59.6579221Z #22 324.4 ptxas info : 131 bytes gmem, 112 bytes cmem[4] 2025-09-07T06:30:59.6583736Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6592217Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6596678Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6597642Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:59.6598463Z #22 324.4 ptxas info : Compile time = 1.725 ms 2025-09-07T06:30:59.6602888Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6611217Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6615609Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6616792Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:59.6617597Z #22 324.4 ptxas info : Compile time = 0.766 ms 2025-09-07T06:30:59.6622012Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6629965Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6634439Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6635390Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:59.6636195Z #22 324.4 ptxas info : Compile time = 0.534 ms 2025-09-07T06:30:59.6640545Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6648693Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6653415Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6654348Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:59.6655151Z #22 324.4 ptxas info : Compile time = 0.492 ms 2025-09-07T06:30:59.6659428Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6667302Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6671687Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6672656Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:59.6673468Z #22 324.4 ptxas info : Compile time = 0.478 ms 2025-09-07T06:30:59.6677991Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6686452Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6690757Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6691881Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:59.6692693Z #22 324.4 ptxas info : Compile time = 0.477 ms 2025-09-07T06:30:59.6696960Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6704906Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6709339Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6710603Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:59.6711440Z #22 324.4 ptxas info : Compile time = 0.499 ms 2025-09-07T06:30:59.6715755Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6723682Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6727975Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6728904Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:59.6729702Z #22 324.4 ptxas info : Compile time = 0.467 ms 2025-09-07T06:30:59.6734075Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6742208Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6746396Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6747363Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:30:59.6748193Z #22 324.4 ptxas info : Compile time = 0.520 ms 2025-09-07T06:30:59.6752746Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6760355Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6764463Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6765353Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:59.6766272Z #22 324.4 ptxas info : Compile time = 0.485 ms 2025-09-07T06:30:59.6768644Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6772128Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:59.6774162Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6775083Z #22 324.4 ptxas info : Used 75 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:59.6775854Z #22 324.4 ptxas info : Compile time = 34.328 ms 2025-09-07T06:30:59.6780192Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6788075Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6792414Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6793349Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:59.6794470Z #22 324.4 ptxas info : Compile time = 0.850 ms 2025-09-07T06:30:59.6798507Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6805902Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:59.6810048Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6811170Z #22 324.4 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:59.6811969Z #22 324.4 ptxas info : Compile time = 14.229 ms 2025-09-07T06:30:59.6815925Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6823372Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:59.6827409Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6828308Z #22 324.4 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:59.6829087Z #22 324.4 ptxas info : Compile time = 11.511 ms 2025-09-07T06:30:59.6833368Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6841281Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6845700Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6846676Z #22 324.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:30:59.6847499Z #22 324.4 ptxas info : Compile time = 0.758 ms 2025-09-07T06:30:59.6849983Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:59.6853803Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:59.6855879Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6856814Z #22 324.4 ptxas info : Used 78 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:59.6857599Z #22 324.4 ptxas info : Compile time = 38.131 ms 2025-09-07T06:30:59.6858200Z #22 324.4 ptxas info : 11 bytes gmem 2025-09-07T06:30:59.6862378Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.6870376Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6874701Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6875552Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:59.6876243Z #22 324.4 ptxas info : Compile time = 244.105 ms 2025-09-07T06:30:59.6880852Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.6888796Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6893195Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6894065Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:59.6894799Z #22 324.4 ptxas info : Compile time = 246.683 ms 2025-09-07T06:30:59.6899155Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.6907060Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6911582Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6912407Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:59.6913125Z #22 324.4 ptxas info : Compile time = 339.306 ms 2025-09-07T06:30:59.6917513Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.6925415Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6929772Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6930609Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:59.6931491Z #22 324.4 ptxas info : Compile time = 286.419 ms 2025-09-07T06:30:59.6935875Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.6944019Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6948482Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6949548Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:59.6950262Z #22 324.4 ptxas info : Compile time = 303.398 ms 2025-09-07T06:30:59.6954686Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.6962435Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6966713Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6967529Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:59.6969283Z #22 324.4 ptxas info : Compile time = 274.030 ms 2025-09-07T06:30:59.6973719Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.6981566Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.6985923Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.6986769Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:59.6987470Z #22 324.4 ptxas info : Compile time = 379.343 ms 2025-09-07T06:30:59.6991718Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.6999531Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.7004222Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.7005081Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:59.7005788Z #22 324.4 ptxas info : Compile time = 319.822 ms 2025-09-07T06:30:59.7010065Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.7017805Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.7021924Z #22 324.4 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:30:59.7023117Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:30:59.7024151Z #22 324.4 ptxas info : Compile time = 301.499 ms 2025-09-07T06:30:59.7028326Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.7036207Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.7040470Z #22 324.4 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:30:59.7041527Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:30:59.7042456Z #22 324.4 ptxas info : Compile time = 275.165 ms 2025-09-07T06:30:59.7044531Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.7047950Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:59.7050386Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.7051315Z #22 324.4 ptxas info : Used 76 registers, used 0 barriers 2025-09-07T06:30:59.7052032Z #22 324.4 ptxas info : Compile time = 38.814 ms 2025-09-07T06:30:59.7056403Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.7065113Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.7069829Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.7070720Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:59.7071456Z #22 324.4 ptxas info : Compile time = 352.245 ms 2025-09-07T06:30:59.7075889Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.7083656Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:59.7088013Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.7089198Z #22 324.4 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:30:59.8053330Z #22 324.4 ptxas info : Compile time = 31.226 ms 2025-09-07T06:30:59.8057432Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.8064537Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:59.8068447Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.8069280Z #22 324.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:59.8069966Z #22 324.4 ptxas info : Compile time = 25.263 ms 2025-09-07T06:30:59.8074131Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.8082129Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:59.8086346Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.8087157Z #22 324.4 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:30:59.8087863Z #22 324.4 ptxas info : Compile time = 298.559 ms 2025-09-07T06:30:59.8089854Z #22 324.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:59.8093320Z #22 324.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:59.8095351Z #22 324.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:59.8096174Z #22 324.4 ptxas info : Used 74 registers, used 0 barriers 2025-09-07T06:30:59.8096832Z #22 324.4 ptxas info : Compile time = 45.203 ms 2025-09-07T06:31:00.1154432Z #22 324.9 [10/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:00.2692890Z #22 324.9 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:31:00.2697048Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:00.2704936Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2708919Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.2709826Z #22 324.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:00.2710580Z #22 324.9 ptxas info : Compile time = 1.883 ms 2025-09-07T06:31:00.2715026Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:00.2723163Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2727631Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.2728594Z #22 324.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:00.2729382Z #22 324.9 ptxas info : Compile time = 31.014 ms 2025-09-07T06:31:00.2733934Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:00.2742511Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2747125Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.2748113Z #22 324.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:00.2749189Z #22 324.9 ptxas info : Compile time = 1.091 ms 2025-09-07T06:31:00.2753519Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:00.2761320Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2766080Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.2767015Z #22 324.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:00.2767833Z #22 324.9 ptxas info : Compile time = 0.670 ms 2025-09-07T06:31:00.2772479Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:00.2780979Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2785607Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.2786603Z #22 324.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:00.2787422Z #22 324.9 ptxas info : Compile time = 0.619 ms 2025-09-07T06:31:00.2792289Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:00.2801426Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2806203Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.2807192Z #22 324.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:00.2808036Z #22 324.9 ptxas info : Compile time = 0.574 ms 2025-09-07T06:31:00.2812795Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:00.2820662Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2825216Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.2826271Z #22 324.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:00.2827115Z #22 324.9 ptxas info : Compile time = 0.608 ms 2025-09-07T06:31:00.2832286Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:00.2841095Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2845876Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.2846884Z #22 324.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:00.2847758Z #22 324.9 ptxas info : Compile time = 0.565 ms 2025-09-07T06:31:00.2852907Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:00.2861965Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2866712Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.2867669Z #22 324.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:00.2868508Z #22 324.9 ptxas info : Compile time = 0.536 ms 2025-09-07T06:31:00.2869180Z #22 324.9 ptxas info : 11 bytes gmem 2025-09-07T06:31:00.2873338Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:00.2881409Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2885715Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.2886630Z #22 324.9 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:00.2887383Z #22 324.9 ptxas info : Compile time = 627.677 ms 2025-09-07T06:31:00.2892527Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:00.2901346Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2906190Z #22 324.9 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:00.2907241Z #22 324.9 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:00.2908205Z #22 324.9 ptxas info : Compile time = 817.920 ms 2025-09-07T06:31:00.2912968Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:00.2921857Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2926812Z #22 324.9 40 bytes stack frame, 84 bytes spill stores, 92 bytes spill loads 2025-09-07T06:31:00.2927932Z #22 324.9 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:00.2928854Z #22 324.9 ptxas info : Compile time = 1673.904 ms 2025-09-07T06:31:00.2933608Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:00.2941781Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.2946201Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.2947084Z #22 324.9 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:00.2947863Z #22 324.9 ptxas info : Compile time = 1261.631 ms 2025-09-07T06:31:00.3058194Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:00.3067069Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.3071816Z #22 324.9 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:00.3072842Z #22 324.9 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:00.3073798Z #22 324.9 ptxas info : Compile time = 1637.649 ms 2025-09-07T06:31:00.3078504Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:00.3086970Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.3092136Z #22 324.9 48 bytes stack frame, 104 bytes spill stores, 132 bytes spill loads 2025-09-07T06:31:00.3093240Z #22 324.9 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:31:00.3094142Z #22 324.9 ptxas info : Compile time = 2850.311 ms 2025-09-07T06:31:00.3098342Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:00.3106188Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.3110505Z #22 324.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:00.3111373Z #22 324.9 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:00.3112133Z #22 324.9 ptxas info : Compile time = 851.816 ms 2025-09-07T06:31:00.3116877Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:00.3126325Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.3131186Z #22 324.9 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:00.3132250Z #22 324.9 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:00.3133144Z #22 324.9 ptxas info : Compile time = 1143.806 ms 2025-09-07T06:31:00.3138615Z #22 324.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:00.3147074Z #22 324.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:00.3151906Z #22 324.9 40 bytes stack frame, 92 bytes spill stores, 96 bytes spill loads 2025-09-07T06:31:00.3153292Z #22 324.9 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:00.3154249Z #22 324.9 ptxas info : Compile time = 2481.112 ms 2025-09-07T06:31:01.6738286Z #22 326.4 [11/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:01.6757109Z #22 326.4 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:31:01.6761770Z #22 326.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:01.6770178Z #22 326.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.6775075Z #22 326.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.6776120Z #22 326.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:01.6777010Z #22 326.4 ptxas info : Compile time = 1.607 ms 2025-09-07T06:31:01.6782015Z #22 326.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:01.6791878Z #22 326.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.6796865Z #22 326.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.6797944Z #22 326.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:01.6798829Z #22 326.4 ptxas info : Compile time = 0.829 ms 2025-09-07T06:31:01.6803861Z #22 326.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:01.6813156Z #22 326.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.6818022Z #22 326.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.6819152Z #22 326.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:01.6820150Z #22 326.4 ptxas info : Compile time = 0.723 ms 2025-09-07T06:31:01.6825709Z #22 326.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:01.6835456Z #22 326.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.6840364Z #22 326.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.6841365Z #22 326.5 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:31:01.6842260Z #22 326.5 ptxas info : Compile time = 0.488 ms 2025-09-07T06:31:01.6847222Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:01.6896043Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.6901416Z #22 326.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.6902454Z #22 326.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:01.6903380Z #22 326.5 ptxas info : Compile time = 0.466 ms 2025-09-07T06:31:01.6908383Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:01.6917633Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.6922641Z #22 326.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.6923707Z #22 326.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:01.6924478Z #22 326.5 ptxas info : Compile time = 0.456 ms 2025-09-07T06:31:01.6929282Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:01.6938440Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.6943351Z #22 326.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.6944396Z #22 326.5 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:31:01.6945327Z #22 326.5 ptxas info : Compile time = 0.480 ms 2025-09-07T06:31:01.6950713Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:01.6959854Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.6965176Z #22 326.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.6966250Z #22 326.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:01.6967154Z #22 326.5 ptxas info : Compile time = 0.450 ms 2025-09-07T06:31:01.6972313Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:01.6981436Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.6986473Z #22 326.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.6987521Z #22 326.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:01.6988436Z #22 326.5 ptxas info : Compile time = 0.845 ms 2025-09-07T06:31:01.6989105Z #22 326.5 ptxas info : 11 bytes gmem 2025-09-07T06:31:01.6993650Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:01.7001605Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.7006083Z #22 326.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.7006972Z #22 326.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:01.7007743Z #22 326.5 ptxas info : Compile time = 577.357 ms 2025-09-07T06:31:01.7012720Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:01.7021183Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.7025763Z #22 326.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.7026892Z #22 326.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:01.7027626Z #22 326.5 ptxas info : Compile time = 718.258 ms 2025-09-07T06:31:01.7032166Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:01.7040540Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.7044940Z #22 326.5 32 bytes stack frame, 72 bytes spill stores, 80 bytes spill loads 2025-09-07T06:31:01.7046040Z #22 326.5 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:31:01.7047080Z #22 326.5 ptxas info : Compile time = 1434.384 ms 2025-09-07T06:31:01.7052170Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:01.7061298Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.7065752Z #22 326.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.7066663Z #22 326.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:01.7067487Z #22 326.5 ptxas info : Compile time = 1314.318 ms 2025-09-07T06:31:01.7072561Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:01.7081847Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.7086903Z #22 326.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.7087838Z #22 326.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:01.7088635Z #22 326.5 ptxas info : Compile time = 1429.497 ms 2025-09-07T06:31:01.7094201Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:01.7103051Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.7108134Z #22 326.5 40 bytes stack frame, 96 bytes spill stores, 128 bytes spill loads 2025-09-07T06:31:01.7109287Z #22 326.5 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:01.7110350Z #22 326.5 ptxas info : Compile time = 2501.861 ms 2025-09-07T06:31:01.7115192Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:01.7124207Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.7128997Z #22 326.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:01.8225680Z #22 326.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:01.8226962Z #22 326.5 ptxas info : Compile time = 1036.453 ms 2025-09-07T06:31:01.8232355Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:01.8242113Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.8247552Z #22 326.5 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:01.8249138Z #22 326.5 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:01.8250428Z #22 326.5 ptxas info : Compile time = 1031.753 ms 2025-09-07T06:31:01.8255872Z #22 326.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:01.8267056Z #22 326.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:01.8272147Z #22 326.5 32 bytes stack frame, 68 bytes spill stores, 76 bytes spill loads 2025-09-07T06:31:01.8273345Z #22 326.5 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:31:01.8274422Z #22 326.5 ptxas info : Compile time = 2007.285 ms 2025-09-07T06:31:07.2469388Z #22 332.0 [12/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:07.2487159Z #22 332.0 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:31:07.2583944Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.2592370Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2597509Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2598604Z #22 332.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:07.2599544Z #22 332.0 ptxas info : Compile time = 1.851 ms 2025-09-07T06:31:07.2604650Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.2613660Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2618857Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2619904Z #22 332.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:07.2620805Z #22 332.0 ptxas info : Compile time = 0.924 ms 2025-09-07T06:31:07.2625854Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.2636279Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2641459Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2642535Z #22 332.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:07.2643383Z #22 332.0 ptxas info : Compile time = 16.730 ms 2025-09-07T06:31:07.2647526Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.2656912Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2661698Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2662893Z #22 332.0 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:31:07.2663680Z #22 332.0 ptxas info : Compile time = 0.683 ms 2025-09-07T06:31:07.2668581Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.2677896Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2682414Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2683460Z #22 332.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:07.2684368Z #22 332.0 ptxas info : Compile time = 0.521 ms 2025-09-07T06:31:07.2689389Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.2698758Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2703762Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2704818Z #22 332.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:07.2705728Z #22 332.0 ptxas info : Compile time = 0.478 ms 2025-09-07T06:31:07.2710676Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.2719040Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2724035Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2725087Z #22 332.0 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:31:07.2726381Z #22 332.0 ptxas info : Compile time = 0.555 ms 2025-09-07T06:31:07.2731628Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.2740237Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2745412Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2746494Z #22 332.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:07.2747372Z #22 332.0 ptxas info : Compile time = 0.464 ms 2025-09-07T06:31:07.2752326Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.2761597Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2766759Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2767817Z #22 332.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:07.2768722Z #22 332.0 ptxas info : Compile time = 0.471 ms 2025-09-07T06:31:07.2769334Z #22 332.0 ptxas info : 11 bytes gmem 2025-09-07T06:31:07.2773231Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.2781593Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2786320Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2787273Z #22 332.0 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:07.2788077Z #22 332.0 ptxas info : Compile time = 388.569 ms 2025-09-07T06:31:07.2792353Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.2802051Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2806951Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2807746Z #22 332.0 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:07.2808383Z #22 332.0 ptxas info : Compile time = 477.072 ms 2025-09-07T06:31:07.2813452Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.2822740Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2827608Z #22 332.0 32 bytes stack frame, 72 bytes spill stores, 80 bytes spill loads 2025-09-07T06:31:07.2828663Z #22 332.0 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:31:07.2829614Z #22 332.0 ptxas info : Compile time = 900.226 ms 2025-09-07T06:31:07.2834515Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.2843279Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2847742Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2848711Z #22 332.0 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:07.2849822Z #22 332.0 ptxas info : Compile time = 1118.956 ms 2025-09-07T06:31:07.2855023Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.2865575Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2870494Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2871408Z #22 332.0 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:07.2872240Z #22 332.0 ptxas info : Compile time = 1603.607 ms 2025-09-07T06:31:07.2877322Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.2885633Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2890647Z #22 332.0 40 bytes stack frame, 96 bytes spill stores, 128 bytes spill loads 2025-09-07T06:31:07.2892357Z #22 332.0 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:07.2893430Z #22 332.0 ptxas info : Compile time = 2828.569 ms 2025-09-07T06:31:07.2898351Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.2906343Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2911026Z #22 332.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.2911941Z #22 332.0 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:07.2912745Z #22 332.0 ptxas info : Compile time = 1089.059 ms 2025-09-07T06:31:07.2917818Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.2926490Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2931615Z #22 332.0 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:07.2932734Z #22 332.0 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:07.2933782Z #22 332.0 ptxas info : Compile time = 1066.461 ms 2025-09-07T06:31:07.2938879Z #22 332.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.2947377Z #22 332.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.2952819Z #22 332.0 32 bytes stack frame, 68 bytes spill stores, 76 bytes spill loads 2025-09-07T06:31:07.2954049Z #22 332.0 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:31:07.2955043Z #22 332.0 ptxas info : Compile time = 2091.149 ms 2025-09-07T06:31:07.6399611Z #22 332.4 [13/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:07.7919725Z #22 332.4 ptxas info : 131 bytes gmem, 112 bytes cmem[4] 2025-09-07T06:31:07.7924571Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.7933448Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.7938364Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.7939411Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:07.7940296Z #22 332.4 ptxas info : Compile time = 1.674 ms 2025-09-07T06:31:07.7945116Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.7954581Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.7959514Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.7960508Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:07.7961276Z #22 332.4 ptxas info : Compile time = 0.755 ms 2025-09-07T06:31:07.7966135Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.7975175Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.7979965Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.7980995Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:07.7981835Z #22 332.4 ptxas info : Compile time = 0.502 ms 2025-09-07T06:31:07.7986994Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.7995803Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8000608Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8001621Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:07.8002445Z #22 332.4 ptxas info : Compile time = 0.489 ms 2025-09-07T06:31:07.8007120Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8016049Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8020810Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8021909Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:07.8022841Z #22 332.4 ptxas info : Compile time = 0.475 ms 2025-09-07T06:31:07.8027420Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8036369Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8041242Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8042344Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:07.8043302Z #22 332.4 ptxas info : Compile time = 0.464 ms 2025-09-07T06:31:07.8048131Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8057835Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8062739Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8063778Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:07.8064773Z #22 332.4 ptxas info : Compile time = 0.461 ms 2025-09-07T06:31:07.8069930Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8082481Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8087165Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8088520Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:07.8089360Z #22 332.4 ptxas info : Compile time = 0.461 ms 2025-09-07T06:31:07.8094139Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8102340Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8106704Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8107689Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:07.8108582Z #22 332.4 ptxas info : Compile time = 0.511 ms 2025-09-07T06:31:07.8113031Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8123565Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8128338Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8129398Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:07.8130343Z #22 332.4 ptxas info : Compile time = 0.478 ms 2025-09-07T06:31:07.8132812Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8136616Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:07.8139162Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8140080Z #22 332.4 ptxas info : Used 47 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:07.8140945Z #22 332.4 ptxas info : Compile time = 18.659 ms 2025-09-07T06:31:07.8145429Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8154411Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8159194Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8160213Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:07.8161071Z #22 332.4 ptxas info : Compile time = 0.895 ms 2025-09-07T06:31:07.8165359Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8173100Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:07.8177493Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8178472Z #22 332.4 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:07.8179681Z #22 332.4 ptxas info : Compile time = 16.147 ms 2025-09-07T06:31:07.8184032Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x48x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8192009Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x48x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:07.8196555Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8197525Z #22 332.4 ptxas info : Used 29 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:07.8198411Z #22 332.4 ptxas info : Compile time = 11.779 ms 2025-09-07T06:31:07.8203007Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8211774Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8216565Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8217603Z #22 332.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:07.8218449Z #22 332.4 ptxas info : Compile time = 0.748 ms 2025-09-07T06:31:07.8220842Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:07.8224873Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:07.8227292Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8228274Z #22 332.4 ptxas info : Used 52 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:07.8229104Z #22 332.4 ptxas info : Compile time = 22.360 ms 2025-09-07T06:31:07.8229781Z #22 332.4 ptxas info : 11 bytes gmem 2025-09-07T06:31:07.8234199Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8242662Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8247615Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8248505Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:07.8249617Z #22 332.4 ptxas info : Compile time = 295.195 ms 2025-09-07T06:31:07.8254377Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8263285Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8268074Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8268943Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:07.8269651Z #22 332.4 ptxas info : Compile time = 296.604 ms 2025-09-07T06:31:07.8274817Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8283698Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8288353Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8289331Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:07.8290136Z #22 332.4 ptxas info : Compile time = 376.728 ms 2025-09-07T06:31:07.8295103Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8303980Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8309264Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8310194Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:07.8310962Z #22 332.4 ptxas info : Compile time = 338.567 ms 2025-09-07T06:31:07.8315594Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8325000Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8329716Z #22 332.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:07.8330810Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:07.8331990Z #22 332.4 ptxas info : Compile time = 344.379 ms 2025-09-07T06:31:07.8336800Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8345995Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8351093Z #22 332.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:07.8352339Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:07.8353402Z #22 332.4 ptxas info : Compile time = 320.843 ms 2025-09-07T06:31:07.8358235Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8366895Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8372070Z #22 332.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:07.8373704Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:07.8374768Z #22 332.4 ptxas info : Compile time = 404.734 ms 2025-09-07T06:31:07.8379801Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8388230Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8393294Z #22 332.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:07.8394389Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:07.8395397Z #22 332.4 ptxas info : Compile time = 362.203 ms 2025-09-07T06:31:07.8400162Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8409237Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8414032Z #22 332.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:07.8415141Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:07.8416140Z #22 332.4 ptxas info : Compile time = 336.700 ms 2025-09-07T06:31:07.8420659Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8428895Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8433478Z #22 332.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:07.8434603Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:07.8435582Z #22 332.4 ptxas info : Compile time = 320.773 ms 2025-09-07T06:31:07.8438096Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8441656Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:07.8443964Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8444913Z #22 332.4 ptxas info : Used 44 registers, used 0 barriers 2025-09-07T06:31:07.8445658Z #22 332.4 ptxas info : Compile time = 21.808 ms 2025-09-07T06:31:07.8450682Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8459352Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8463838Z #22 332.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:07.8464970Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:07.8465941Z #22 332.4 ptxas info : Compile time = 390.351 ms 2025-09-07T06:31:07.8470553Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8478447Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:07.8482878Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8483802Z #22 332.4 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:31:07.8484559Z #22 332.4 ptxas info : Compile time = 28.810 ms 2025-09-07T06:31:07.8488987Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x48x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8496891Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x48x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:07.8501725Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8502678Z #22 332.4 ptxas info : Used 47 registers, used 1 barriers 2025-09-07T06:31:07.8503437Z #22 332.4 ptxas info : Compile time = 20.835 ms 2025-09-07T06:31:07.8507943Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8514585Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:07.8518252Z #22 332.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:07.8519134Z #22 332.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:07.8519898Z #22 332.4 ptxas info : Compile time = 340.659 ms 2025-09-07T06:31:07.8521672Z #22 332.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:07.8524596Z #22 332.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:07.8526756Z #22 332.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:07.8527545Z #22 332.4 ptxas info : Used 50 registers, used 0 barriers 2025-09-07T06:31:07.8528182Z #22 332.4 ptxas info : Compile time = 25.185 ms 2025-09-07T06:31:08.8971848Z #22 333.7 [14/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:09.0544270Z #22 333.7 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:31:09.0547730Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:09.0554233Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0559014Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0560219Z #22 333.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:09.0564492Z #22 333.7 ptxas info : Compile time = 1.805 ms 2025-09-07T06:31:09.0569504Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:09.0578269Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0583081Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0584077Z #22 333.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:09.0584865Z #22 333.7 ptxas info : Compile time = 0.922 ms 2025-09-07T06:31:09.0589679Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:09.0598292Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0603521Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0604564Z #22 333.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:09.0605410Z #22 333.7 ptxas info : Compile time = 0.761 ms 2025-09-07T06:31:09.0609684Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:09.0617784Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0622096Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0623105Z #22 333.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:09.0624000Z #22 333.7 ptxas info : Compile time = 20.831 ms 2025-09-07T06:31:09.0628723Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:09.0637694Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0642409Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0643417Z #22 333.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:09.0644290Z #22 333.7 ptxas info : Compile time = 0.721 ms 2025-09-07T06:31:09.0649404Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:09.0659861Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0665303Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0666815Z #22 333.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:09.0667579Z #22 333.7 ptxas info : Compile time = 0.600 ms 2025-09-07T06:31:09.0672550Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:09.0681757Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0685812Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0686583Z #22 333.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:09.0687243Z #22 333.7 ptxas info : Compile time = 0.573 ms 2025-09-07T06:31:09.0690783Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:09.0697850Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0701503Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0702274Z #22 333.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:09.0702919Z #22 333.7 ptxas info : Compile time = 0.530 ms 2025-09-07T06:31:09.0706881Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:09.0716527Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0721884Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0722956Z #22 333.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:09.0723895Z #22 333.7 ptxas info : Compile time = 0.530 ms 2025-09-07T06:31:09.0724910Z #22 333.7 ptxas info : 11 bytes gmem 2025-09-07T06:31:09.0729909Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:09.0739046Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0744154Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0745133Z #22 333.7 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:09.0745968Z #22 333.7 ptxas info : Compile time = 527.621 ms 2025-09-07T06:31:09.0751643Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:09.0758485Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0762083Z #22 333.7 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:09.0762920Z #22 333.7 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:09.0763644Z #22 333.7 ptxas info : Compile time = 723.108 ms 2025-09-07T06:31:09.0767308Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:09.0774890Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0780003Z #22 333.7 40 bytes stack frame, 84 bytes spill stores, 92 bytes spill loads 2025-09-07T06:31:09.0781176Z #22 333.7 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:09.0782157Z #22 333.7 ptxas info : Compile time = 1467.583 ms 2025-09-07T06:31:09.0786947Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:09.0796464Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0801534Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0802412Z #22 333.7 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:09.0803261Z #22 333.7 ptxas info : Compile time = 1127.303 ms 2025-09-07T06:31:09.0808746Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:09.0818821Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0822452Z #22 333.7 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:09.0823522Z #22 333.7 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:09.0824284Z #22 333.7 ptxas info : Compile time = 1485.383 ms 2025-09-07T06:31:09.0827822Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:09.0834464Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0838187Z #22 333.7 48 bytes stack frame, 104 bytes spill stores, 132 bytes spill loads 2025-09-07T06:31:09.0839223Z #22 333.7 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:31:09.0840346Z #22 333.7 ptxas info : Compile time = 2728.023 ms 2025-09-07T06:31:09.0845052Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:09.0853736Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0858304Z #22 333.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:09.0859246Z #22 333.7 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:09.0860025Z #22 333.7 ptxas info : Compile time = 763.864 ms 2025-09-07T06:31:09.0864594Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:09.0873262Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0877852Z #22 333.7 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:09.0878918Z #22 333.7 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:09.0879886Z #22 333.7 ptxas info : Compile time = 1044.212 ms 2025-09-07T06:31:09.0884975Z #22 333.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:09.0893799Z #22 333.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:09.0898603Z #22 333.7 40 bytes stack frame, 92 bytes spill stores, 96 bytes spill loads 2025-09-07T06:31:09.0899577Z #22 333.7 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:09.0900552Z #22 333.7 ptxas info : Compile time = 2219.410 ms 2025-09-07T06:31:18.6305671Z #22 343.4 [15/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:18.7871733Z #22 343.4 ptxas info : 131 bytes gmem, 112 bytes cmem[4] 2025-09-07T06:31:18.7877080Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.7887208Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.7892878Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.7894031Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:18.7895025Z #22 343.4 ptxas info : Compile time = 1.740 ms 2025-09-07T06:31:18.7900274Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.7909740Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.7915093Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.7916241Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:18.7917579Z #22 343.4 ptxas info : Compile time = 0.760 ms 2025-09-07T06:31:18.7922921Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.7932929Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.7938381Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.7939413Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:18.7940295Z #22 343.4 ptxas info : Compile time = 0.535 ms 2025-09-07T06:31:18.7945606Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.7955788Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.7961101Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.7962228Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:18.7963230Z #22 343.4 ptxas info : Compile time = 0.504 ms 2025-09-07T06:31:18.7968647Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.7978866Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.7984152Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.7985320Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:18.7986307Z #22 343.4 ptxas info : Compile time = 0.510 ms 2025-09-07T06:31:18.7991789Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.8002088Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8007481Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8008627Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:18.8009588Z #22 343.4 ptxas info : Compile time = 0.545 ms 2025-09-07T06:31:18.8015162Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.8025101Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8030562Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8031728Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:18.8032909Z #22 343.4 ptxas info : Compile time = 0.494 ms 2025-09-07T06:31:18.8038330Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.8048272Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8054254Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8055415Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:18.8056415Z #22 343.4 ptxas info : Compile time = 0.492 ms 2025-09-07T06:31:18.8061596Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.8071126Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8076312Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8077485Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:18.8078482Z #22 343.4 ptxas info : Compile time = 0.541 ms 2025-09-07T06:31:18.8083607Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.8093168Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8098244Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8099380Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:18.8100391Z #22 343.4 ptxas info : Compile time = 0.512 ms 2025-09-07T06:31:18.8103169Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.8107317Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:18.8109762Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8110843Z #22 343.4 ptxas info : Used 48 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:18.8111727Z #22 343.4 ptxas info : Compile time = 18.674 ms 2025-09-07T06:31:18.8116195Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.8125880Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8131484Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8132562Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:18.8133577Z #22 343.4 ptxas info : Compile time = 0.818 ms 2025-09-07T06:31:18.8138716Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.8147523Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:18.8152911Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8154126Z #22 343.4 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:18.8155109Z #22 343.4 ptxas info : Compile time = 13.350 ms 2025-09-07T06:31:18.8160117Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x48x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.8169010Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x48x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:18.8174443Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8175592Z #22 343.4 ptxas info : Used 29 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:18.8176571Z #22 343.4 ptxas info : Compile time = 11.246 ms 2025-09-07T06:31:18.8182004Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.8191944Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8197405Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8198589Z #22 343.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:18.8199589Z #22 343.4 ptxas info : Compile time = 0.860 ms 2025-09-07T06:31:18.8202172Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:18.8206326Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:18.8209188Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8210339Z #22 343.4 ptxas info : Used 51 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:18.8211517Z #22 343.4 ptxas info : Compile time = 21.567 ms 2025-09-07T06:31:18.8212213Z #22 343.4 ptxas info : 11 bytes gmem 2025-09-07T06:31:18.8217342Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8227147Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8232209Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8233197Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:18.8233980Z #22 343.4 ptxas info : Compile time = 294.164 ms 2025-09-07T06:31:18.8239483Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8249578Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8254952Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8255994Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:18.8256828Z #22 343.4 ptxas info : Compile time = 302.383 ms 2025-09-07T06:31:18.8262133Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8271803Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8277387Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8278443Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:18.8279292Z #22 343.4 ptxas info : Compile time = 377.460 ms 2025-09-07T06:31:18.8284703Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8294641Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8300017Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8301047Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:18.8301916Z #22 343.4 ptxas info : Compile time = 334.809 ms 2025-09-07T06:31:18.8307366Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8317516Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8322676Z #22 343.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:18.8324011Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:18.8325151Z #22 343.4 ptxas info : Compile time = 349.676 ms 2025-09-07T06:31:18.8330446Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8340446Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8345864Z #22 343.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:18.8347147Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:18.8348272Z #22 343.4 ptxas info : Compile time = 325.629 ms 2025-09-07T06:31:18.8354198Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8364141Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8369627Z #22 343.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:18.8370915Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:18.8372191Z #22 343.4 ptxas info : Compile time = 399.596 ms 2025-09-07T06:31:18.8377591Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8387771Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8393275Z #22 343.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:18.8394484Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:18.8395521Z #22 343.4 ptxas info : Compile time = 357.389 ms 2025-09-07T06:31:18.8400708Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8410036Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8415388Z #22 343.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:18.8416632Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:18.8417755Z #22 343.4 ptxas info : Compile time = 336.444 ms 2025-09-07T06:31:18.8422897Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8432655Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8437835Z #22 343.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:18.8439080Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:18.8440139Z #22 343.4 ptxas info : Compile time = 321.746 ms 2025-09-07T06:31:18.8442680Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8446832Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:18.8449778Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8450810Z #22 343.4 ptxas info : Used 48 registers, used 0 barriers 2025-09-07T06:31:18.8451744Z #22 343.4 ptxas info : Compile time = 23.112 ms 2025-09-07T06:31:18.8457357Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8467031Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8472450Z #22 343.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:18.8473744Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:18.8474886Z #22 343.4 ptxas info : Compile time = 381.502 ms 2025-09-07T06:31:18.8479896Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8488875Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:18.8494467Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8495517Z #22 343.4 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:31:18.8496383Z #22 343.4 ptxas info : Compile time = 28.492 ms 2025-09-07T06:31:18.8501289Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x48x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8510065Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x48x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:18.8515115Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8516144Z #22 343.4 ptxas info : Used 47 registers, used 1 barriers 2025-09-07T06:31:18.8517002Z #22 343.4 ptxas info : Compile time = 20.214 ms 2025-09-07T06:31:18.8521955Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8531994Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:18.8536990Z #22 343.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:18.8538016Z #22 343.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:18.8539016Z #22 343.4 ptxas info : Compile time = 341.871 ms 2025-09-07T06:31:18.8541493Z #22 343.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:18.8545657Z #22 343.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:18.8548221Z #22 343.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:18.8552058Z #22 343.4 ptxas info : Used 48 registers, used 0 barriers 2025-09-07T06:31:18.8552909Z #22 343.4 ptxas info : Compile time = 26.725 ms 2025-09-07T06:31:38.5852566Z #22 363.4 [16/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:38.5871933Z #22 363.4 ptxas info : 131 bytes gmem, 112 bytes cmem[4] 2025-09-07T06:31:38.5876756Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.5886557Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.5892041Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.5893191Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:38.5894056Z #22 363.4 ptxas info : Compile time = 1.379 ms 2025-09-07T06:31:38.5899047Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.5908133Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.5913291Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.5914449Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:38.5915466Z #22 363.4 ptxas info : Compile time = 0.589 ms 2025-09-07T06:31:38.5918313Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.5922570Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:38.5925140Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.5926296Z #22 363.4 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:38.5927300Z #22 363.4 ptxas info : Compile time = 27.387 ms 2025-09-07T06:31:38.5932133Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.5941213Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.5946287Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.5947363Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:38.5948354Z #22 363.4 ptxas info : Compile time = 0.596 ms 2025-09-07T06:31:38.5953821Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.5962570Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:38.5967466Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.5968540Z #22 363.4 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:38.5969558Z #22 363.4 ptxas info : Compile time = 10.190 ms 2025-09-07T06:31:38.5974822Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.5984082Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.5989606Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.5990792Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:38.5991698Z #22 363.4 ptxas info : Compile time = 0.580 ms 2025-09-07T06:31:38.5994218Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.5998242Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6000737Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6001866Z #22 363.4 ptxas info : Used 72 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:38.6002762Z #22 363.4 ptxas info : Compile time = 27.824 ms 2025-09-07T06:31:38.6007965Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6017453Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6022560Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6024081Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:38.6025052Z #22 363.4 ptxas info : Compile time = 0.611 ms 2025-09-07T06:31:38.6030002Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6039311Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6044520Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6045602Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:38.6046514Z #22 363.4 ptxas info : Compile time = 0.457 ms 2025-09-07T06:31:38.6052111Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6061897Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6066964Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6068150Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:38.6069110Z #22 363.4 ptxas info : Compile time = 0.447 ms 2025-09-07T06:31:38.6074107Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6083380Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6088565Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6089627Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:38.6090621Z #22 363.4 ptxas info : Compile time = 0.426 ms 2025-09-07T06:31:38.6096178Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6105199Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6110205Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6111343Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:38.6112365Z #22 363.4 ptxas info : Compile time = 0.379 ms 2025-09-07T06:31:38.6117224Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6126157Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6131610Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6132774Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:38.6133768Z #22 363.4 ptxas info : Compile time = 0.430 ms 2025-09-07T06:31:38.6136184Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6140245Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6142717Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6143882Z #22 363.4 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:38.6144854Z #22 363.4 ptxas info : Compile time = 21.111 ms 2025-09-07T06:31:38.6150227Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6159599Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6168064Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6169274Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:38.6170231Z #22 363.4 ptxas info : Compile time = 0.595 ms 2025-09-07T06:31:38.6176165Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA26MMA_64x128x16_F32F16F16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6184752Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA26MMA_64x128x16_F32F16F16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6189501Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6190684Z #22 363.4 ptxas info : Used 29 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:38.6191651Z #22 363.4 ptxas info : Compile time = 12.372 ms 2025-09-07T06:31:38.6196383Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6205577Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6210362Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6211747Z #22 363.4 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:38.6212719Z #22 363.4 ptxas info : Compile time = 9.132 ms 2025-09-07T06:31:38.6217743Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6227048Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6232129Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6233319Z #22 363.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:38.6234222Z #22 363.4 ptxas info : Compile time = 0.570 ms 2025-09-07T06:31:38.6237036Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:38.6241095Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6243626Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6244724Z #22 363.4 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:38.6245636Z #22 363.4 ptxas info : Compile time = 23.325 ms 2025-09-07T06:31:38.6246420Z #22 363.4 ptxas info : 11 bytes gmem 2025-09-07T06:31:38.6251926Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6261890Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6266982Z #22 363.4 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:31:38.6269849Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:38.6270916Z #22 363.4 ptxas info : Compile time = 429.518 ms 2025-09-07T06:31:38.6275837Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6284894Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6289662Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6290616Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:38.6291611Z #22 363.4 ptxas info : Compile time = 395.050 ms 2025-09-07T06:31:38.6293872Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6297887Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6300284Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6301213Z #22 363.4 ptxas info : Used 70 registers, used 0 barriers 2025-09-07T06:31:38.6301999Z #22 363.4 ptxas info : Compile time = 38.142 ms 2025-09-07T06:31:38.6307206Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6316019Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6320607Z #22 363.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:38.6321848Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:38.6322891Z #22 363.4 ptxas info : Compile time = 510.311 ms 2025-09-07T06:31:38.6327228Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6335958Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6340969Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6341911Z #22 363.4 ptxas info : Used 64 registers, used 1 barriers 2025-09-07T06:31:38.6342717Z #22 363.4 ptxas info : Compile time = 26.091 ms 2025-09-07T06:31:38.6347523Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6356560Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6361318Z #22 363.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:38.6362493Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:38.6363485Z #22 363.4 ptxas info : Compile time = 475.106 ms 2025-09-07T06:31:38.6365774Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6370636Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6373510Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6374563Z #22 363.4 ptxas info : Used 72 registers, used 0 barriers 2025-09-07T06:31:38.6375478Z #22 363.4 ptxas info : Compile time = 49.146 ms 2025-09-07T06:31:38.6380468Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6389548Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6394792Z #22 363.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:38.6395963Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:38.6397015Z #22 363.4 ptxas info : Compile time = 434.725 ms 2025-09-07T06:31:38.6402844Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6413196Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6418413Z #22 363.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:38.6419695Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:38.6420898Z #22 363.4 ptxas info : Compile time = 396.716 ms 2025-09-07T06:31:38.6425481Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6433701Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6438337Z #22 363.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:38.6439890Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:38.6440943Z #22 363.4 ptxas info : Compile time = 524.228 ms 2025-09-07T06:31:38.6445501Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6453665Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6457584Z #22 363.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:38.6458754Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:38.6459749Z #22 363.4 ptxas info : Compile time = 425.325 ms 2025-09-07T06:31:38.6463554Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6471099Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6474893Z #22 363.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:38.6476052Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:38.6477067Z #22 363.4 ptxas info : Compile time = 435.981 ms 2025-09-07T06:31:38.6480763Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6488528Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6493302Z #22 363.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:38.6494450Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:38.6495527Z #22 363.4 ptxas info : Compile time = 388.862 ms 2025-09-07T06:31:38.6498222Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6501827Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6503802Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6504517Z #22 363.4 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:31:38.6505136Z #22 363.4 ptxas info : Compile time = 36.442 ms 2025-09-07T06:31:38.6509297Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6516957Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6521101Z #22 363.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:38.6522213Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:38.6523465Z #22 363.4 ptxas info : Compile time = 504.346 ms 2025-09-07T06:31:38.6527235Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA26MMA_64x128x16_F32F16F16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6534466Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA26MMA_64x128x16_F32F16F16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6538753Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6539723Z #22 363.4 ptxas info : Used 88 registers, used 1 barriers 2025-09-07T06:31:38.6540559Z #22 363.4 ptxas info : Compile time = 37.631 ms 2025-09-07T06:31:38.6544886Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6553047Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6557419Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6558151Z #22 363.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:31:38.6558770Z #22 363.4 ptxas info : Compile time = 25.364 ms 2025-09-07T06:31:38.6562807Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6569752Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:38.6573983Z #22 363.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:38.6574893Z #22 363.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:38.6575696Z #22 363.4 ptxas info : Compile time = 407.173 ms 2025-09-07T06:31:38.6577907Z #22 363.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:38.6581537Z #22 363.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:38.6583337Z #22 363.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:38.6584071Z #22 363.4 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:31:38.6584821Z #22 363.4 ptxas info : Compile time = 40.236 ms 2025-09-07T06:31:41.5429635Z #22 366.3 [17/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm100.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm100.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm100.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:41.6957357Z #22 366.3 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:31:41.6966694Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.6976126Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.6981223Z #22 366.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.6982337Z #22 366.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:31:41.6983249Z #22 366.3 ptxas info : Compile time = 1.849 ms 2025-09-07T06:31:41.6988418Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.7000612Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7005949Z #22 366.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.7007030Z #22 366.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:31:41.7007993Z #22 366.3 ptxas info : Compile time = 0.933 ms 2025-09-07T06:31:41.7013467Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.7023271Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7028942Z #22 366.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.7030020Z #22 366.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:31:41.7030932Z #22 366.3 ptxas info : Compile time = 0.862 ms 2025-09-07T06:31:41.7035723Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.7044958Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7050206Z #22 366.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.7051385Z #22 366.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:31:41.7052298Z #22 366.3 ptxas info : Compile time = 0.591 ms 2025-09-07T06:31:41.7057277Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.7067119Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7072416Z #22 366.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.7073434Z #22 366.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:31:41.7074325Z #22 366.3 ptxas info : Compile time = 0.557 ms 2025-09-07T06:31:41.7079495Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.7088618Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7094236Z #22 366.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.7095324Z #22 366.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:31:41.7096620Z #22 366.3 ptxas info : Compile time = 0.793 ms 2025-09-07T06:31:41.7101695Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.7110526Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7115596Z #22 366.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.7116718Z #22 366.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:31:41.7117620Z #22 366.3 ptxas info : Compile time = 0.603 ms 2025-09-07T06:31:41.7122695Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.7132533Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7137727Z #22 366.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.7138777Z #22 366.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:31:41.7139676Z #22 366.3 ptxas info : Compile time = 0.501 ms 2025-09-07T06:31:41.7144768Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.7153826Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7158467Z #22 366.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.7159440Z #22 366.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:31:41.7160275Z #22 366.3 ptxas info : Compile time = 0.477 ms 2025-09-07T06:31:41.7160907Z #22 366.3 ptxas info : 11 bytes gmem 2025-09-07T06:31:41.7165649Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.7174405Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7179114Z #22 366.3 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:41.7180175Z #22 366.3 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:41.7181204Z #22 366.3 ptxas info : Compile time = 689.275 ms 2025-09-07T06:31:41.7186355Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.7196197Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7201362Z #22 366.3 40 bytes stack frame, 116 bytes spill stores, 132 bytes spill loads 2025-09-07T06:31:41.7202551Z #22 366.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:41.7203610Z #22 366.3 ptxas info : Compile time = 884.964 ms 2025-09-07T06:31:41.7208776Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.7218373Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7223573Z #22 366.3 136 bytes stack frame, 340 bytes spill stores, 568 bytes spill loads 2025-09-07T06:31:41.7224820Z #22 366.3 ptxas info : Used 168 registers, used 16 barriers, 136 bytes cumulative stack size 2025-09-07T06:31:41.7225720Z #22 366.3 ptxas info : Compile time = 1791.788 ms 2025-09-07T06:31:41.7229747Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.7236760Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7254811Z #22 366.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.7255579Z #22 366.3 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:41.7256236Z #22 366.3 ptxas info : Compile time = 1262.530 ms 2025-09-07T06:31:41.7260287Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.7267805Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7272208Z #22 366.3 32 bytes stack frame, 96 bytes spill stores, 104 bytes spill loads 2025-09-07T06:31:41.7273141Z #22 366.3 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:31:41.7273974Z #22 366.3 ptxas info : Compile time = 1457.183 ms 2025-09-07T06:31:41.7278028Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.7285965Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7290807Z #22 366.3 56 bytes stack frame, 228 bytes spill stores, 308 bytes spill loads 2025-09-07T06:31:41.7292127Z #22 366.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:31:41.7293151Z #22 366.3 ptxas info : Compile time = 2610.113 ms 2025-09-07T06:31:41.7298486Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.7307318Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7312551Z #22 366.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.7313493Z #22 366.3 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:41.7314308Z #22 366.3 ptxas info : Compile time = 868.390 ms 2025-09-07T06:31:41.7319421Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.7328898Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7334622Z #22 366.3 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:41.7335722Z #22 366.3 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:31:41.7336723Z #22 366.3 ptxas info : Compile time = 1053.912 ms 2025-09-07T06:31:41.7341978Z #22 366.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.7351829Z #22 366.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.7357168Z #22 366.3 56 bytes stack frame, 280 bytes spill stores, 336 bytes spill loads 2025-09-07T06:31:41.7358354Z #22 366.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:31:41.7359342Z #22 366.3 ptxas info : Compile time = 1918.095 ms 2025-09-07T06:31:46.7765559Z #22 371.6 [18/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:46.9330276Z #22 371.6 ptxas info : 131 bytes gmem, 112 bytes cmem[4] 2025-09-07T06:31:46.9335416Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9343819Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9348545Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9349762Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:46.9350568Z #22 371.6 ptxas info : Compile time = 1.886 ms 2025-09-07T06:31:46.9354970Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9362700Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9367584Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9368612Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:46.9369449Z #22 371.6 ptxas info : Compile time = 0.902 ms 2025-09-07T06:31:46.9371728Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9375195Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9377338Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9378347Z #22 371.6 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:46.9379201Z #22 371.6 ptxas info : Compile time = 36.461 ms 2025-09-07T06:31:46.9383792Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9392257Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9397420Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9398475Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:46.9399326Z #22 371.6 ptxas info : Compile time = 0.783 ms 2025-09-07T06:31:46.9403710Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9411968Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9416505Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9417550Z #22 371.6 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:46.9418436Z #22 371.6 ptxas info : Compile time = 13.081 ms 2025-09-07T06:31:46.9423009Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9431757Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9436126Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9437089Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:46.9437934Z #22 371.6 ptxas info : Compile time = 0.762 ms 2025-09-07T06:31:46.9439961Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9443497Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9445731Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9446695Z #22 371.6 ptxas info : Used 72 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:46.9447559Z #22 371.6 ptxas info : Compile time = 40.663 ms 2025-09-07T06:31:46.9454314Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9463069Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9467709Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9468762Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:46.9469650Z #22 371.6 ptxas info : Compile time = 0.797 ms 2025-09-07T06:31:46.9474007Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9482346Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9487104Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9488138Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:46.9489034Z #22 371.6 ptxas info : Compile time = 0.586 ms 2025-09-07T06:31:46.9494231Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9502667Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9507392Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9508445Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:46.9509329Z #22 371.6 ptxas info : Compile time = 0.567 ms 2025-09-07T06:31:46.9513657Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9522038Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9526653Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9527720Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:46.9528636Z #22 371.6 ptxas info : Compile time = 0.513 ms 2025-09-07T06:31:46.9533345Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9541608Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9546224Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9547242Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:31:46.9548105Z #22 371.6 ptxas info : Compile time = 0.539 ms 2025-09-07T06:31:46.9554565Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9563368Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9568508Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9569489Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:46.9570382Z #22 371.6 ptxas info : Compile time = 0.500 ms 2025-09-07T06:31:46.9572711Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9576299Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9578652Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9579685Z #22 371.6 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:46.9580555Z #22 371.6 ptxas info : Compile time = 27.567 ms 2025-09-07T06:31:46.9584998Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9594527Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9599884Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9600992Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:46.9601868Z #22 371.6 ptxas info : Compile time = 0.874 ms 2025-09-07T06:31:46.9606111Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA28MMA_64x128x16_F32BF16BF16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9614179Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA28MMA_64x128x16_F32BF16BF16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9618686Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9620008Z #22 371.6 ptxas info : Used 29 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:46.9620912Z #22 371.6 ptxas info : Compile time = 20.655 ms 2025-09-07T06:31:46.9625335Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9633553Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9638150Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9639115Z #22 371.6 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:46.9639954Z #22 371.6 ptxas info : Compile time = 14.781 ms 2025-09-07T06:31:46.9644380Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9653499Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9658263Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9659259Z #22 371.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:31:46.9660142Z #22 371.6 ptxas info : Compile time = 0.926 ms 2025-09-07T06:31:46.9662303Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:46.9665928Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9668183Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9669188Z #22 371.6 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:46.9670028Z #22 371.6 ptxas info : Compile time = 36.524 ms 2025-09-07T06:31:46.9670668Z #22 371.6 ptxas info : 11 bytes gmem 2025-09-07T06:31:46.9675145Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9683880Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9688481Z #22 371.6 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:31:46.9689551Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:46.9690504Z #22 371.6 ptxas info : Compile time = 422.707 ms 2025-09-07T06:31:46.9695311Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9703718Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9708446Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9709806Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:31:46.9710586Z #22 371.6 ptxas info : Compile time = 422.443 ms 2025-09-07T06:31:46.9712877Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9716614Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9718911Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9719831Z #22 371.6 ptxas info : Used 70 registers, used 0 barriers 2025-09-07T06:31:46.9720581Z #22 371.6 ptxas info : Compile time = 41.605 ms 2025-09-07T06:31:46.9725305Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9733891Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9738662Z #22 371.6 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:46.9739824Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:46.9740783Z #22 371.6 ptxas info : Compile time = 550.723 ms 2025-09-07T06:31:46.9745856Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9754540Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9759056Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9760016Z #22 371.6 ptxas info : Used 64 registers, used 1 barriers 2025-09-07T06:31:46.9760800Z #22 371.6 ptxas info : Compile time = 28.988 ms 2025-09-07T06:31:46.9765293Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9773920Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9778934Z #22 371.6 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:46.9780202Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:46.9781357Z #22 371.6 ptxas info : Compile time = 464.102 ms 2025-09-07T06:31:46.9783925Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9788317Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9790887Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9791975Z #22 371.6 ptxas info : Used 72 registers, used 0 barriers 2025-09-07T06:31:46.9792857Z #22 371.6 ptxas info : Compile time = 45.602 ms 2025-09-07T06:31:46.9798242Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9808570Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9814135Z #22 371.6 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:46.9815351Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:46.9816310Z #22 371.6 ptxas info : Compile time = 423.957 ms 2025-09-07T06:31:46.9821566Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9831422Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9836784Z #22 371.6 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:46.9838069Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:31:46.9839064Z #22 371.6 ptxas info : Compile time = 387.051 ms 2025-09-07T06:31:46.9844579Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9855355Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9860562Z #22 371.6 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:46.9861944Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:46.9863090Z #22 371.6 ptxas info : Compile time = 505.714 ms 2025-09-07T06:31:46.9868232Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9877988Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9883151Z #22 371.6 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:46.9884827Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:46.9886001Z #22 371.6 ptxas info : Compile time = 411.724 ms 2025-09-07T06:31:46.9890798Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9899759Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9904202Z #22 371.6 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:46.9905311Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:46.9906265Z #22 371.6 ptxas info : Compile time = 425.722 ms 2025-09-07T06:31:46.9911007Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9919554Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9924109Z #22 371.6 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:46.9925185Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:46.9926211Z #22 371.6 ptxas info : Compile time = 389.590 ms 2025-09-07T06:31:46.9928416Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9932397Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9934887Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9935936Z #22 371.6 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:31:46.9936829Z #22 371.6 ptxas info : Compile time = 35.762 ms 2025-09-07T06:31:46.9942172Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9952630Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:46.9957989Z #22 371.6 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:46.9959209Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:46.9960363Z #22 371.6 ptxas info : Compile time = 498.655 ms 2025-09-07T06:31:46.9965318Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA28MMA_64x128x16_F32BF16BF16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9974308Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA28MMA_64x128x16_F32BF16BF16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:46.9979424Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:46.9980349Z #22 371.6 ptxas info : Used 88 registers, used 1 barriers 2025-09-07T06:31:46.9981618Z #22 371.6 ptxas info : Compile time = 40.298 ms 2025-09-07T06:31:46.9986576Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:46.9995822Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:47.0000828Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:47.0001910Z #22 371.6 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:31:47.0002808Z #22 371.6 ptxas info : Compile time = 26.221 ms 2025-09-07T06:31:47.0008039Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:47.0018095Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:47.0023553Z #22 371.6 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:47.0024777Z #22 371.6 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:31:47.0025715Z #22 371.6 ptxas info : Compile time = 406.958 ms 2025-09-07T06:31:47.0028386Z #22 371.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:47.0032471Z #22 371.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:47.0035147Z #22 371.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:47.0036222Z #22 371.6 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:31:47.0037112Z #22 371.6 ptxas info : Compile time = 39.660 ms 2025-09-07T06:32:00.8190421Z #22 385.6 [19/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_packgqa_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_packgqa_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_packgqa_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:32:00.9684943Z #22 385.6 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:32:00.9690362Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.9700039Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.9705161Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.9706410Z #22 385.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:00.9707465Z #22 385.6 ptxas info : Compile time = 24.112 ms 2025-09-07T06:32:00.9712334Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.9721665Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.9727000Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.9728272Z #22 385.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:00.9729367Z #22 385.6 ptxas info : Compile time = 1.064 ms 2025-09-07T06:32:00.9734694Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.9745044Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.9750666Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.9751822Z #22 385.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:00.9752879Z #22 385.6 ptxas info : Compile time = 0.908 ms 2025-09-07T06:32:00.9757844Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.9767968Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.9773610Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.9774858Z #22 385.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:00.9775883Z #22 385.6 ptxas info : Compile time = 0.626 ms 2025-09-07T06:32:00.9780598Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.9789195Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.9794215Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.9795337Z #22 385.6 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:32:00.9796337Z #22 385.6 ptxas info : Compile time = 0.604 ms 2025-09-07T06:32:00.9801162Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.9810775Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.9816184Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.9817373Z #22 385.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:00.9818457Z #22 385.6 ptxas info : Compile time = 0.616 ms 2025-09-07T06:32:00.9823465Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.9832686Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.9838060Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.9839590Z #22 385.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:00.9840683Z #22 385.6 ptxas info : Compile time = 0.561 ms 2025-09-07T06:32:00.9845516Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.0053671Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0057355Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.0058152Z #22 385.6 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:32:01.0058819Z #22 385.6 ptxas info : Compile time = 0.499 ms 2025-09-07T06:32:01.0062533Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.0069737Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0073426Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.0074214Z #22 385.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:01.0074898Z #22 385.6 ptxas info : Compile time = 0.488 ms 2025-09-07T06:32:01.0078627Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.0085411Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0089132Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.0089923Z #22 385.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:01.0090636Z #22 385.6 ptxas info : Compile time = 0.458 ms 2025-09-07T06:32:01.0091685Z #22 385.6 ptxas info : 11 bytes gmem 2025-09-07T06:32:01.0095178Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.0101676Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0105259Z #22 385.6 24 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:32:01.0106124Z #22 385.6 ptxas info : Used 168 registers, used 16 barriers, 24 bytes cumulative stack size 2025-09-07T06:32:01.0106895Z #22 385.6 ptxas info : Compile time = 834.313 ms 2025-09-07T06:32:01.0110514Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.0117373Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0120977Z #22 385.6 40 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads 2025-09-07T06:32:01.0121855Z #22 385.6 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:32:01.0122614Z #22 385.6 ptxas info : Compile time = 834.377 ms 2025-09-07T06:32:01.0126485Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.0133765Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0137634Z #22 385.6 48 bytes stack frame, 96 bytes spill stores, 100 bytes spill loads 2025-09-07T06:32:01.0138513Z #22 385.6 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:32:01.0139277Z #22 385.6 ptxas info : Compile time = 1018.396 ms 2025-09-07T06:32:01.0143411Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.0150750Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0154623Z #22 385.6 120 bytes stack frame, 360 bytes spill stores, 468 bytes spill loads 2025-09-07T06:32:01.0155518Z #22 385.6 ptxas info : Used 168 registers, used 16 barriers, 120 bytes cumulative stack size 2025-09-07T06:32:01.0156289Z #22 385.6 ptxas info : Compile time = 2098.811 ms 2025-09-07T06:32:01.0160053Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.0167021Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0170622Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.0171536Z #22 385.6 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:01.0172143Z #22 385.6 ptxas info : Compile time = 1436.126 ms 2025-09-07T06:32:01.0175847Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.0182637Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0186376Z #22 385.6 40 bytes stack frame, 92 bytes spill stores, 112 bytes spill loads 2025-09-07T06:32:01.0187277Z #22 385.6 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:32:01.0188058Z #22 385.6 ptxas info : Compile time = 1628.958 ms 2025-09-07T06:32:01.0192156Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.0198903Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0202607Z #22 385.6 56 bytes stack frame, 184 bytes spill stores, 228 bytes spill loads 2025-09-07T06:32:01.0203497Z #22 385.6 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:32:01.0204273Z #22 385.6 ptxas info : Compile time = 2639.260 ms 2025-09-07T06:32:01.0207888Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.0214655Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0218482Z #22 385.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.0219205Z #22 385.6 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:01.0219795Z #22 385.6 ptxas info : Compile time = 933.696 ms 2025-09-07T06:32:01.0223464Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.0230250Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0233961Z #22 385.6 32 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads 2025-09-07T06:32:01.0234840Z #22 385.6 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:32:01.0235644Z #22 385.6 ptxas info : Compile time = 1053.637 ms 2025-09-07T06:32:01.0239568Z #22 385.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.0246291Z #22 385.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.0250566Z #22 385.6 56 bytes stack frame, 224 bytes spill stores, 256 bytes spill loads 2025-09-07T06:32:01.0251589Z #22 385.6 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:32:01.0252342Z #22 385.6 ptxas info : Compile time = 1988.121 ms 2025-09-07T06:32:02.6823201Z #22 387.5 [20/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:32:02.6837804Z #22 387.5 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:32:02.6841496Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:02.6848551Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.6852686Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.6853484Z #22 387.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:02.6854155Z #22 387.5 ptxas info : Compile time = 1.627 ms 2025-09-07T06:32:02.6858021Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:02.6865238Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.6869170Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.6869959Z #22 387.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:02.6870646Z #22 387.5 ptxas info : Compile time = 0.817 ms 2025-09-07T06:32:02.6874555Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:02.6882116Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.6886025Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.6886808Z #22 387.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:02.6887482Z #22 387.5 ptxas info : Compile time = 0.704 ms 2025-09-07T06:32:02.6891107Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:02.6897384Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.6901183Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.6901967Z #22 387.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:02.6902642Z #22 387.5 ptxas info : Compile time = 0.492 ms 2025-09-07T06:32:02.6906358Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:02.6913226Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.6916998Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.6917809Z #22 387.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:02.6918482Z #22 387.5 ptxas info : Compile time = 0.458 ms 2025-09-07T06:32:02.6922299Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:02.6929371Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.6933248Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.6934036Z #22 387.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:02.6934728Z #22 387.5 ptxas info : Compile time = 20.579 ms 2025-09-07T06:32:02.6938148Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:02.6944409Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.6947898Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.6948688Z #22 387.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:02.6952208Z #22 387.5 ptxas info : Compile time = 0.965 ms 2025-09-07T06:32:02.6956049Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:02.6962852Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.6966583Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.6967378Z #22 387.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:02.6968059Z #22 387.5 ptxas info : Compile time = 0.696 ms 2025-09-07T06:32:02.6971942Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:02.6979020Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.6982750Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.6983516Z #22 387.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:02.6984197Z #22 387.5 ptxas info : Compile time = 0.602 ms 2025-09-07T06:32:02.6984710Z #22 387.5 ptxas info : 11 bytes gmem 2025-09-07T06:32:02.6988272Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:02.6994898Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.6998530Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.6999242Z #22 387.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:02.6999853Z #22 387.5 ptxas info : Compile time = 829.188 ms 2025-09-07T06:32:02.7004005Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:02.7011369Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.7015267Z #22 387.5 56 bytes stack frame, 168 bytes spill stores, 192 bytes spill loads 2025-09-07T06:32:02.7016150Z #22 387.5 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:32:02.7016928Z #22 387.5 ptxas info : Compile time = 1176.318 ms 2025-09-07T06:32:02.7020796Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:02.7028117Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.7032015Z #22 387.5 152 bytes stack frame, 432 bytes spill stores, 576 bytes spill loads 2025-09-07T06:32:02.7032900Z #22 387.5 ptxas info : Used 168 registers, used 16 barriers, 152 bytes cumulative stack size 2025-09-07T06:32:02.7033673Z #22 387.5 ptxas info : Compile time = 2267.186 ms 2025-09-07T06:32:02.7037160Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:02.7043446Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.7046887Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.7047620Z #22 387.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:02.7048245Z #22 387.5 ptxas info : Compile time = 1225.665 ms 2025-09-07T06:32:02.7153301Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:02.7160356Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.7164206Z #22 387.5 64 bytes stack frame, 176 bytes spill stores, 200 bytes spill loads 2025-09-07T06:32:02.7165112Z #22 387.5 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:32:02.7165896Z #22 387.5 ptxas info : Compile time = 1776.120 ms 2025-09-07T06:32:02.7173365Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:02.7180325Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.7184446Z #22 387.5 96 bytes stack frame, 228 bytes spill stores, 276 bytes spill loads 2025-09-07T06:32:02.7185342Z #22 387.5 ptxas info : Used 168 registers, used 16 barriers, 96 bytes cumulative stack size 2025-09-07T06:32:02.7186103Z #22 387.5 ptxas info : Compile time = 2871.501 ms 2025-09-07T06:32:02.7189583Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:02.7195912Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.7199368Z #22 387.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:02.8317575Z #22 387.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:02.8318277Z #22 387.5 ptxas info : Compile time = 857.214 ms 2025-09-07T06:32:02.8322551Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:02.8329581Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.8333587Z #22 387.5 64 bytes stack frame, 176 bytes spill stores, 188 bytes spill loads 2025-09-07T06:32:02.8334508Z #22 387.5 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:32:02.8335284Z #22 387.5 ptxas info : Compile time = 1217.946 ms 2025-09-07T06:32:02.8339063Z #22 387.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:02.8345980Z #22 387.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:02.8350648Z #22 387.5 120 bytes stack frame, 252 bytes spill stores, 292 bytes spill loads 2025-09-07T06:32:02.8351543Z #22 387.5 ptxas info : Used 168 registers, used 16 barriers, 120 bytes cumulative stack size 2025-09-07T06:32:02.8352318Z #22 387.5 ptxas info : Compile time = 2176.702 ms 2025-09-07T06:32:11.5176733Z #22 396.3 [21/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:32:11.6671378Z #22 396.3 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:32:11.6675040Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:11.6681706Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6685337Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.6686144Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:11.6686823Z #22 396.3 ptxas info : Compile time = 1.874 ms 2025-09-07T06:32:11.6690708Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:11.6698481Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6702368Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.6703177Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:11.6703854Z #22 396.3 ptxas info : Compile time = 21.078 ms 2025-09-07T06:32:11.6707726Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:11.6715243Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6719122Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.6719905Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:11.6720565Z #22 396.3 ptxas info : Compile time = 1.037 ms 2025-09-07T06:32:11.6723963Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:11.6730163Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6733765Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.6734531Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:11.6735208Z #22 396.3 ptxas info : Compile time = 0.584 ms 2025-09-07T06:32:11.6738891Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:11.6745837Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6749821Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.6750614Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:11.6751412Z #22 396.3 ptxas info : Compile time = 0.504 ms 2025-09-07T06:32:11.6756035Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:11.6765111Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6770391Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.6772118Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:11.6773095Z #22 396.3 ptxas info : Compile time = 0.464 ms 2025-09-07T06:32:11.6777885Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:11.6786000Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6790246Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.6791247Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:11.6792114Z #22 396.3 ptxas info : Compile time = 0.546 ms 2025-09-07T06:32:11.6796850Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:11.6806129Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6811109Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.6812152Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:11.6813029Z #22 396.3 ptxas info : Compile time = 0.498 ms 2025-09-07T06:32:11.6817834Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:11.6826724Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6831610Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.6832724Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:11.6833674Z #22 396.3 ptxas info : Compile time = 0.461 ms 2025-09-07T06:32:11.6834298Z #22 396.3 ptxas info : 11 bytes gmem 2025-09-07T06:32:11.6838184Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:11.6844683Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6848308Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.6849419Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:11.6850130Z #22 396.3 ptxas info : Compile time = 795.011 ms 2025-09-07T06:32:11.6854711Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:11.6864067Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6869474Z #22 396.3 56 bytes stack frame, 168 bytes spill stores, 192 bytes spill loads 2025-09-07T06:32:11.6870650Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:32:11.6871673Z #22 396.3 ptxas info : Compile time = 1130.245 ms 2025-09-07T06:32:11.6876741Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:11.6886386Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6891716Z #22 396.3 152 bytes stack frame, 432 bytes spill stores, 576 bytes spill loads 2025-09-07T06:32:11.6892827Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers, 152 bytes cumulative stack size 2025-09-07T06:32:11.6893878Z #22 396.3 ptxas info : Compile time = 2200.070 ms 2025-09-07T06:32:11.6898929Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:11.6906931Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6911255Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.6912211Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:11.6913019Z #22 396.3 ptxas info : Compile time = 1259.774 ms 2025-09-07T06:32:11.6917961Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:11.6927004Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.6932816Z #22 396.3 64 bytes stack frame, 176 bytes spill stores, 200 bytes spill loads 2025-09-07T06:32:11.6934005Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:32:11.6935021Z #22 396.3 ptxas info : Compile time = 1749.073 ms 2025-09-07T06:32:11.6939897Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:11.7049869Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.7054020Z #22 396.3 96 bytes stack frame, 228 bytes spill stores, 276 bytes spill loads 2025-09-07T06:32:11.7054910Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers, 96 bytes cumulative stack size 2025-09-07T06:32:11.7055685Z #22 396.3 ptxas info : Compile time = 2958.009 ms 2025-09-07T06:32:11.7059539Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:11.7065787Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.7069230Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:11.7069971Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:11.7070563Z #22 396.3 ptxas info : Compile time = 899.783 ms 2025-09-07T06:32:11.7074306Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:11.7081039Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.7085134Z #22 396.3 64 bytes stack frame, 176 bytes spill stores, 188 bytes spill loads 2025-09-07T06:32:11.7086013Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:32:11.7086783Z #22 396.3 ptxas info : Compile time = 1350.198 ms 2025-09-07T06:32:11.7090442Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:11.7097375Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:11.7101113Z #22 396.3 120 bytes stack frame, 252 bytes spill stores, 292 bytes spill loads 2025-09-07T06:32:11.7102014Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers, 120 bytes cumulative stack size 2025-09-07T06:32:11.7102775Z #22 396.3 ptxas info : Compile time = 2369.911 ms 2025-09-07T06:32:16.5118047Z #22 401.3 [22/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:32:16.6701779Z #22 401.3 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:32:16.6706679Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:16.6715376Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6719925Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.6720896Z #22 401.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:32:16.6721746Z #22 401.3 ptxas info : Compile time = 1.629 ms 2025-09-07T06:32:16.6726383Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:16.6734750Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6739285Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.6740574Z #22 401.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:32:16.6741452Z #22 401.3 ptxas info : Compile time = 0.771 ms 2025-09-07T06:32:16.6746328Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:16.6755458Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6760275Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.6761220Z #22 401.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:16.6762107Z #22 401.3 ptxas info : Compile time = 0.705 ms 2025-09-07T06:32:16.6766857Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:16.6776332Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6781316Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.6782182Z #22 401.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:16.6783043Z #22 401.3 ptxas info : Compile time = 0.470 ms 2025-09-07T06:32:16.6787549Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:16.6795722Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6800164Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.6801152Z #22 401.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:32:16.6802003Z #22 401.3 ptxas info : Compile time = 0.458 ms 2025-09-07T06:32:16.6807057Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:16.6815709Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6820217Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.6821245Z #22 401.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:16.6822039Z #22 401.3 ptxas info : Compile time = 20.721 ms 2025-09-07T06:32:16.6826590Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:16.6835223Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6839910Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.6840950Z #22 401.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:16.6841830Z #22 401.3 ptxas info : Compile time = 0.804 ms 2025-09-07T06:32:16.6846141Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:16.6860660Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6865102Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.6866060Z #22 401.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:32:16.6866909Z #22 401.3 ptxas info : Compile time = 0.629 ms 2025-09-07T06:32:16.6871848Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:16.6880363Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6885214Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.6886119Z #22 401.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:16.6886868Z #22 401.3 ptxas info : Compile time = 0.592 ms 2025-09-07T06:32:16.6891676Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:16.6900062Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6905399Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.6906462Z #22 401.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:16.6907386Z #22 401.3 ptxas info : Compile time = 0.529 ms 2025-09-07T06:32:16.6908054Z #22 401.3 ptxas info : 11 bytes gmem 2025-09-07T06:32:16.6913048Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:16.6921956Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6926837Z #22 401.3 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:32:16.6928007Z #22 401.3 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:32:16.6929038Z #22 401.3 ptxas info : Compile time = 764.625 ms 2025-09-07T06:32:16.6934445Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:16.6941705Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6945721Z #22 401.3 24 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:32:16.6946737Z #22 401.3 ptxas info : Used 168 registers, used 16 barriers, 24 bytes cumulative stack size 2025-09-07T06:32:16.6947570Z #22 401.3 ptxas info : Compile time = 776.704 ms 2025-09-07T06:32:16.6952214Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:16.6960577Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6965566Z #22 401.3 40 bytes stack frame, 116 bytes spill stores, 132 bytes spill loads 2025-09-07T06:32:16.6966768Z #22 401.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:32:16.6967781Z #22 401.3 ptxas info : Compile time = 950.527 ms 2025-09-07T06:32:16.6973215Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:16.6983042Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.6988496Z #22 401.3 136 bytes stack frame, 340 bytes spill stores, 568 bytes spill loads 2025-09-07T06:32:16.6989722Z #22 401.3 ptxas info : Used 168 registers, used 16 barriers, 136 bytes cumulative stack size 2025-09-07T06:32:16.6990757Z #22 401.3 ptxas info : Compile time = 1778.445 ms 2025-09-07T06:32:16.6996093Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:16.7005021Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.7010026Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.7011195Z #22 401.3 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:16.7011992Z #22 401.3 ptxas info : Compile time = 1189.567 ms 2025-09-07T06:32:16.7017156Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:16.7026506Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.7031972Z #22 401.3 32 bytes stack frame, 96 bytes spill stores, 104 bytes spill loads 2025-09-07T06:32:16.7033181Z #22 401.3 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:32:16.7034224Z #22 401.3 ptxas info : Compile time = 1558.403 ms 2025-09-07T06:32:16.7039373Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:16.7049039Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.7054245Z #22 401.3 56 bytes stack frame, 228 bytes spill stores, 308 bytes spill loads 2025-09-07T06:32:16.7055460Z #22 401.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:32:16.7056524Z #22 401.3 ptxas info : Compile time = 2756.457 ms 2025-09-07T06:32:16.7061482Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:16.7070764Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.7075854Z #22 401.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:16.7076792Z #22 401.3 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:16.7077607Z #22 401.3 ptxas info : Compile time = 902.620 ms 2025-09-07T06:32:16.7082795Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:16.7092392Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.7097896Z #22 401.3 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:32:16.7099117Z #22 401.3 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:32:16.7100210Z #22 401.3 ptxas info : Compile time = 1089.772 ms 2025-09-07T06:32:16.7105394Z #22 401.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:16.7114904Z #22 401.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:16.7120133Z #22 401.3 56 bytes stack frame, 280 bytes spill stores, 336 bytes spill loads 2025-09-07T06:32:16.7121316Z #22 401.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:32:16.7122376Z #22 401.3 ptxas info : Compile time = 2045.890 ms 2025-09-07T06:32:19.1265441Z #22 403.9 [23/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:32:19.2757422Z #22 403.9 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:32:19.2762715Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:19.2772866Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.2778177Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.2779335Z #22 403.9 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:32:19.2780249Z #22 403.9 ptxas info : Compile time = 1.607 ms 2025-09-07T06:32:19.2785576Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:19.2795353Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.2800733Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.2801892Z #22 403.9 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:32:19.2802848Z #22 403.9 ptxas info : Compile time = 0.757 ms 2025-09-07T06:32:19.2808768Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:19.2819324Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.2825067Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.2826172Z #22 403.9 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:19.2827138Z #22 403.9 ptxas info : Compile time = 0.695 ms 2025-09-07T06:32:19.2832778Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:19.2843375Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.2849261Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.2850377Z #22 403.9 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:19.2851423Z #22 403.9 ptxas info : Compile time = 0.472 ms 2025-09-07T06:32:19.2856610Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:19.2866227Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.2871540Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.2872667Z #22 403.9 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:32:19.2873643Z #22 403.9 ptxas info : Compile time = 20.698 ms 2025-09-07T06:32:19.2879365Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:19.2889374Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.2895012Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.2896175Z #22 403.9 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:19.2897142Z #22 403.9 ptxas info : Compile time = 0.824 ms 2025-09-07T06:32:19.2902473Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:19.2912447Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.2918276Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.2919396Z #22 403.9 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:19.2920371Z #22 403.9 ptxas info : Compile time = 0.728 ms 2025-09-07T06:32:19.2925589Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:19.2935374Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.2940527Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.2941438Z #22 403.9 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:32:19.2942356Z #22 403.9 ptxas info : Compile time = 0.606 ms 2025-09-07T06:32:19.2951825Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:19.2961778Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.2967229Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.2968359Z #22 403.9 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:19.2969297Z #22 403.9 ptxas info : Compile time = 0.572 ms 2025-09-07T06:32:19.2974852Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:19.2984757Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.2990583Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.2991717Z #22 403.9 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:32:19.2992665Z #22 403.9 ptxas info : Compile time = 0.533 ms 2025-09-07T06:32:19.2993348Z #22 403.9 ptxas info : 11 bytes gmem 2025-09-07T06:32:19.2998472Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:19.3007975Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.3013335Z #22 403.9 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:32:19.3014535Z #22 403.9 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:32:19.3015629Z #22 403.9 ptxas info : Compile time = 750.622 ms 2025-09-07T06:32:19.3021214Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:19.3030958Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.3036303Z #22 403.9 24 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:32:19.3037581Z #22 403.9 ptxas info : Used 168 registers, used 16 barriers, 24 bytes cumulative stack size 2025-09-07T06:32:19.3038654Z #22 403.9 ptxas info : Compile time = 754.111 ms 2025-09-07T06:32:19.3044334Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:19.3055101Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.3061169Z #22 403.9 40 bytes stack frame, 116 bytes spill stores, 132 bytes spill loads 2025-09-07T06:32:19.3062408Z #22 403.9 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:32:19.3063521Z #22 403.9 ptxas info : Compile time = 951.914 ms 2025-09-07T06:32:19.3069141Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:19.3079322Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.3085059Z #22 403.9 136 bytes stack frame, 340 bytes spill stores, 568 bytes spill loads 2025-09-07T06:32:19.3086354Z #22 403.9 ptxas info : Used 168 registers, used 16 barriers, 136 bytes cumulative stack size 2025-09-07T06:32:19.3087420Z #22 403.9 ptxas info : Compile time = 1982.269 ms 2025-09-07T06:32:19.3093232Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:19.3102780Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.3108038Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.3109088Z #22 403.9 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:19.3109936Z #22 403.9 ptxas info : Compile time = 1316.285 ms 2025-09-07T06:32:19.3115372Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:19.3125261Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.3131120Z #22 403.9 32 bytes stack frame, 96 bytes spill stores, 104 bytes spill loads 2025-09-07T06:32:19.3132406Z #22 403.9 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:32:19.3133436Z #22 403.9 ptxas info : Compile time = 1556.694 ms 2025-09-07T06:32:19.3138814Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:19.3149031Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.3154479Z #22 403.9 56 bytes stack frame, 228 bytes spill stores, 308 bytes spill loads 2025-09-07T06:32:19.3155731Z #22 403.9 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:32:19.3156875Z #22 403.9 ptxas info : Compile time = 2560.555 ms 2025-09-07T06:32:19.3162429Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:19.3172066Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.3177340Z #22 403.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:19.3178385Z #22 403.9 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:19.3179194Z #22 403.9 ptxas info : Compile time = 876.323 ms 2025-09-07T06:32:19.3184566Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:19.3194106Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.3199886Z #22 403.9 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:32:19.3201179Z #22 403.9 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:32:19.3202292Z #22 403.9 ptxas info : Compile time = 1074.152 ms 2025-09-07T06:32:19.3207628Z #22 403.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:19.3217723Z #22 403.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:19.3223220Z #22 403.9 56 bytes stack frame, 280 bytes spill stores, 336 bytes spill loads 2025-09-07T06:32:19.3224493Z #22 403.9 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:32:19.3225572Z #22 403.9 ptxas info : Compile time = 2044.348 ms 2025-09-07T06:32:23.8346141Z #22 408.6 [24/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_packgqa_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_packgqa_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_packgqa_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:32:23.8360864Z #22 408.6 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:32:23.8364707Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:23.8371922Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8376489Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8377425Z #22 408.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:23.8378248Z #22 408.6 ptxas info : Compile time = 1.688 ms 2025-09-07T06:32:23.8382630Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:23.8390718Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8395050Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8396067Z #22 408.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:23.8396856Z #22 408.6 ptxas info : Compile time = 0.811 ms 2025-09-07T06:32:23.8402145Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:23.8411175Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8416137Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8417067Z #22 408.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:23.8417880Z #22 408.6 ptxas info : Compile time = 0.825 ms 2025-09-07T06:32:23.8422580Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:23.8431380Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8436357Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8437268Z #22 408.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:23.8438037Z #22 408.6 ptxas info : Compile time = 0.561 ms 2025-09-07T06:32:23.8442723Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:23.8452041Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8456610Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8457569Z #22 408.6 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:32:23.8458426Z #22 408.6 ptxas info : Compile time = 0.566 ms 2025-09-07T06:32:23.8463300Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:23.8471632Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8476264Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8489852Z #22 408.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:23.8494152Z #22 408.6 ptxas info : Compile time = 20.718 ms 2025-09-07T06:32:23.8498786Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:23.8507719Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8512309Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8513269Z #22 408.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:23.8514069Z #22 408.6 ptxas info : Compile time = 0.785 ms 2025-09-07T06:32:23.8518474Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:23.8526677Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8531416Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8532392Z #22 408.6 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:32:23.8533279Z #22 408.6 ptxas info : Compile time = 0.538 ms 2025-09-07T06:32:23.8538139Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:23.8546762Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8551648Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8552589Z #22 408.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:23.8553404Z #22 408.6 ptxas info : Compile time = 0.510 ms 2025-09-07T06:32:23.8557969Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:23.8566314Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8571496Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8572383Z #22 408.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:23.8573254Z #22 408.6 ptxas info : Compile time = 0.491 ms 2025-09-07T06:32:23.8573888Z #22 408.6 ptxas info : 11 bytes gmem 2025-09-07T06:32:23.8578284Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:23.8586626Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8591031Z #22 408.6 24 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:32:23.8592176Z #22 408.6 ptxas info : Used 168 registers, used 16 barriers, 24 bytes cumulative stack size 2025-09-07T06:32:23.8593183Z #22 408.6 ptxas info : Compile time = 842.021 ms 2025-09-07T06:32:23.8597900Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:23.8606087Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8610383Z #22 408.6 40 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads 2025-09-07T06:32:23.8611598Z #22 408.6 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:32:23.8612514Z #22 408.6 ptxas info : Compile time = 838.666 ms 2025-09-07T06:32:23.8617392Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:23.8626362Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8631441Z #22 408.6 48 bytes stack frame, 96 bytes spill stores, 100 bytes spill loads 2025-09-07T06:32:23.8632595Z #22 408.6 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:32:23.8633602Z #22 408.6 ptxas info : Compile time = 1044.212 ms 2025-09-07T06:32:23.8638279Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:23.8647333Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8652532Z #22 408.6 120 bytes stack frame, 360 bytes spill stores, 468 bytes spill loads 2025-09-07T06:32:23.8653622Z #22 408.6 ptxas info : Used 168 registers, used 16 barriers, 120 bytes cumulative stack size 2025-09-07T06:32:23.8654544Z #22 408.6 ptxas info : Compile time = 2066.603 ms 2025-09-07T06:32:23.8660930Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:23.8669330Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8673808Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8674729Z #22 408.6 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:23.8675522Z #22 408.6 ptxas info : Compile time = 1464.792 ms 2025-09-07T06:32:23.8680149Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:23.8688714Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8693756Z #22 408.6 40 bytes stack frame, 92 bytes spill stores, 112 bytes spill loads 2025-09-07T06:32:23.8694818Z #22 408.6 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:32:23.8695701Z #22 408.6 ptxas info : Compile time = 1681.837 ms 2025-09-07T06:32:23.8700305Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:23.8708590Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8713265Z #22 408.6 56 bytes stack frame, 184 bytes spill stores, 228 bytes spill loads 2025-09-07T06:32:23.8714312Z #22 408.6 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:32:23.8715304Z #22 408.6 ptxas info : Compile time = 2750.211 ms 2025-09-07T06:32:23.8720011Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:23.8728438Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8733111Z #22 408.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:23.8733968Z #22 408.6 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:32:23.8734794Z #22 408.6 ptxas info : Compile time = 1275.457 ms 2025-09-07T06:32:23.8739309Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:23.8747563Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.8752761Z #22 408.6 32 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads 2025-09-07T06:32:23.8753758Z #22 408.6 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:32:23.9832147Z #22 408.6 ptxas info : Compile time = 1184.423 ms 2025-09-07T06:32:23.9837711Z #22 408.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:23.9848047Z #22 408.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:23.9854122Z #22 408.6 56 bytes stack frame, 224 bytes spill stores, 256 bytes spill loads 2025-09-07T06:32:23.9855366Z #22 408.6 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:32:23.9856444Z #22 408.6 ptxas info : Compile time = 2433.297 ms 2025-09-07T06:33:01.6522526Z #22 446.4 [25/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:01.8040651Z #22 446.4 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:33:01.8046143Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:01.8056891Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8062385Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8063501Z #22 446.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:01.8064478Z #22 446.4 ptxas info : Compile time = 1.287 ms 2025-09-07T06:33:01.8070220Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:01.8077997Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8081803Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8082889Z #22 446.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:01.8083563Z #22 446.4 ptxas info : Compile time = 0.622 ms 2025-09-07T06:33:01.8087337Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:01.8094509Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8098321Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8099090Z #22 446.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:01.8099731Z #22 446.4 ptxas info : Compile time = 0.536 ms 2025-09-07T06:33:01.8103418Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:01.8110554Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8116128Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8117211Z #22 446.4 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:33:01.8118173Z #22 446.4 ptxas info : Compile time = 0.374 ms 2025-09-07T06:33:01.8124105Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:01.8134902Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8140867Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8141990Z #22 446.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:01.8142871Z #22 446.4 ptxas info : Compile time = 0.360 ms 2025-09-07T06:33:01.8148652Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:01.8159437Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8165267Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8166119Z #22 446.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:01.8166778Z #22 446.4 ptxas info : Compile time = 0.354 ms 2025-09-07T06:33:01.8170421Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:01.8177691Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8181370Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8182125Z #22 446.4 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:33:01.8182776Z #22 446.4 ptxas info : Compile time = 0.375 ms 2025-09-07T06:33:01.8186636Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:01.8193552Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8197727Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8198480Z #22 446.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:01.8199137Z #22 446.4 ptxas info : Compile time = 0.362 ms 2025-09-07T06:33:01.8202921Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:01.8213307Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8219222Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8220291Z #22 446.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:01.8221196Z #22 446.4 ptxas info : Compile time = 0.355 ms 2025-09-07T06:33:01.8221876Z #22 446.4 ptxas info : 11 bytes gmem 2025-09-07T06:33:01.8226954Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:01.8236736Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8242149Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8243125Z #22 446.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:01.8243944Z #22 446.4 ptxas info : Compile time = 539.697 ms 2025-09-07T06:33:01.8250145Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:01.8260604Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8264453Z #22 446.4 8 bytes stack frame, 36 bytes spill stores, 24 bytes spill loads 2025-09-07T06:33:01.8265272Z #22 446.4 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:33:01.8266007Z #22 446.4 ptxas info : Compile time = 691.716 ms 2025-09-07T06:33:01.8269806Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:01.8276748Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8280631Z #22 446.4 32 bytes stack frame, 64 bytes spill stores, 96 bytes spill loads 2025-09-07T06:33:01.8281479Z #22 446.4 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:01.8282203Z #22 446.4 ptxas info : Compile time = 1745.288 ms 2025-09-07T06:33:01.8285922Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:01.8293190Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8297092Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8298119Z #22 446.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:01.8299009Z #22 446.4 ptxas info : Compile time = 1233.418 ms 2025-09-07T06:33:01.8304610Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:01.8315629Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8321284Z #22 446.4 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:33:01.8322505Z #22 446.4 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:33:01.8323573Z #22 446.4 ptxas info : Compile time = 1397.508 ms 2025-09-07T06:33:01.8329179Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:01.8339941Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8345758Z #22 446.4 48 bytes stack frame, 64 bytes spill stores, 112 bytes spill loads 2025-09-07T06:33:01.8346983Z #22 446.4 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:01.8348058Z #22 446.4 ptxas info : Compile time = 2749.346 ms 2025-09-07T06:33:01.8353604Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:01.8360306Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8363990Z #22 446.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:01.8364682Z #22 446.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:01.8365261Z #22 446.4 ptxas info : Compile time = 834.901 ms 2025-09-07T06:33:01.8369048Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:01.8376541Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8380343Z #22 446.4 8 bytes stack frame, 36 bytes spill stores, 24 bytes spill loads 2025-09-07T06:33:01.8381161Z #22 446.4 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:33:01.8381886Z #22 446.4 ptxas info : Compile time = 969.248 ms 2025-09-07T06:33:01.8385671Z #22 446.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:01.8394030Z #22 446.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:01.8399635Z #22 446.4 40 bytes stack frame, 68 bytes spill stores, 108 bytes spill loads 2025-09-07T06:33:01.8400864Z #22 446.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:33:01.8402264Z #22 446.4 ptxas info : Compile time = 2148.399 ms 2025-09-07T06:33:28.7801889Z #22 473.6 [26/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:28.7820485Z #22 473.6 ptxas info : 11 bytes gmem 2025-09-07T06:33:28.7825571Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.7834484Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.7839556Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.7840448Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.7841236Z #22 473.6 ptxas info : Compile time = 1.980 ms 2025-09-07T06:33:28.7846120Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.7856420Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.7861800Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.7862633Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.7863311Z #22 473.6 ptxas info : Compile time = 0.911 ms 2025-09-07T06:33:28.7867687Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.7875682Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.7880074Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.7880997Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.7881675Z #22 473.6 ptxas info : Compile time = 0.607 ms 2025-09-07T06:33:28.7886181Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.7894884Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.7899718Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.7900574Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.7901267Z #22 473.6 ptxas info : Compile time = 0.536 ms 2025-09-07T06:33:28.7905740Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.7913878Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.7917374Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.7918089Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.7918663Z #22 473.6 ptxas info : Compile time = 0.534 ms 2025-09-07T06:33:28.7922392Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.7928529Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.7932126Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.7932843Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.7933435Z #22 473.6 ptxas info : Compile time = 0.510 ms 2025-09-07T06:33:28.7936853Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.7943049Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.7946469Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.7947446Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.7948036Z #22 473.6 ptxas info : Compile time = 0.527 ms 2025-09-07T06:33:28.7952113Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.7958331Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.7961731Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.7962433Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.7963007Z #22 473.6 ptxas info : Compile time = 0.523 ms 2025-09-07T06:33:28.7966292Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.7972728Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.7976037Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.7976719Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.7977316Z #22 473.6 ptxas info : Compile time = 0.587 ms 2025-09-07T06:33:28.7980558Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.7986514Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.7989793Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.7990497Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.7991068Z #22 473.6 ptxas info : Compile time = 0.519 ms 2025-09-07T06:33:28.7994736Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8000932Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8004343Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8005044Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8005633Z #22 473.6 ptxas info : Compile time = 0.519 ms 2025-09-07T06:33:28.8009037Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8015329Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8018729Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8019650Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8020232Z #22 473.6 ptxas info : Compile time = 0.500 ms 2025-09-07T06:33:28.8024241Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8031485Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8035637Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8036353Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8036925Z #22 473.6 ptxas info : Compile time = 0.557 ms 2025-09-07T06:33:28.8040306Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8046676Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8050312Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8051000Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8051678Z #22 473.6 ptxas info : Compile time = 0.498 ms 2025-09-07T06:33:28.8055071Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8061275Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8064675Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8065371Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8065937Z #22 473.6 ptxas info : Compile time = 0.500 ms 2025-09-07T06:33:28.8069310Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8075804Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8079198Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8079900Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8080487Z #22 473.6 ptxas info : Compile time = 0.499 ms 2025-09-07T06:33:28.8083882Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8090067Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8093611Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8094301Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8094894Z #22 473.6 ptxas info : Compile time = 0.550 ms 2025-09-07T06:33:28.8098573Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8104881Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8108302Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8109010Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8109596Z #22 473.6 ptxas info : Compile time = 0.509 ms 2025-09-07T06:33:28.8113011Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8119183Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8122838Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8123525Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8124120Z #22 473.6 ptxas info : Compile time = 0.524 ms 2025-09-07T06:33:28.8127486Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8133817Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8137234Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8137933Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8138507Z #22 473.6 ptxas info : Compile time = 0.513 ms 2025-09-07T06:33:28.8141796Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8147982Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8151768Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8152468Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8153036Z #22 473.6 ptxas info : Compile time = 0.509 ms 2025-09-07T06:33:28.8156285Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8162263Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8165547Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8166242Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8166833Z #22 473.6 ptxas info : Compile time = 0.513 ms 2025-09-07T06:33:28.8168557Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8171909Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:28.8173656Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8174344Z #22 473.6 ptxas info : Used 44 registers, used 0 barriers 2025-09-07T06:33:28.8174935Z #22 473.6 ptxas info : Compile time = 20.045 ms 2025-09-07T06:33:28.8178351Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8184528Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8187951Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8188658Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8189231Z #22 473.6 ptxas info : Compile time = 0.991 ms 2025-09-07T06:33:28.8192661Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8198317Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:28.8201471Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8202181Z #22 473.6 ptxas info : Used 68 registers, used 1 barriers 2025-09-07T06:33:28.8202778Z #22 473.6 ptxas info : Compile time = 35.757 ms 2025-09-07T06:33:28.8205878Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8211634Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:28.8214717Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8215635Z #22 473.6 ptxas info : Used 44 registers, used 1 barriers 2025-09-07T06:33:28.8216232Z #22 473.6 ptxas info : Compile time = 19.093 ms 2025-09-07T06:33:28.8219648Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8225802Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8229229Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8229953Z #22 473.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:28.8230526Z #22 473.6 ptxas info : Compile time = 0.892 ms 2025-09-07T06:33:28.8232264Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:28.8235097Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:28.8236831Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8237535Z #22 473.6 ptxas info : Used 50 registers, used 0 barriers 2025-09-07T06:33:28.8238123Z #22 473.6 ptxas info : Compile time = 21.292 ms 2025-09-07T06:33:28.8238708Z #22 473.6 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:33:28.8242345Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8248540Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8252357Z #22 473.6 96 bytes stack frame, 96 bytes spill stores, 152 bytes spill loads 2025-09-07T06:33:28.8253345Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8254182Z #22 473.6 ptxas info : Compile time = 642.135 ms 2025-09-07T06:33:28.8257578Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8263708Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8267383Z #22 473.6 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:33:28.8268363Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8269193Z #22 473.6 ptxas info : Compile time = 590.975 ms 2025-09-07T06:33:28.8272613Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8278801Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8282226Z #22 473.6 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:33:28.8283202Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8284050Z #22 473.6 ptxas info : Compile time = 651.805 ms 2025-09-07T06:33:28.8287776Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8294173Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8297766Z #22 473.6 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:33:28.8298726Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8299598Z #22 473.6 ptxas info : Compile time = 598.824 ms 2025-09-07T06:33:28.8303001Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8309205Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8312854Z #22 473.6 80 bytes stack frame, 76 bytes spill stores, 100 bytes spill loads 2025-09-07T06:33:28.8313822Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8314663Z #22 473.6 ptxas info : Compile time = 680.288 ms 2025-09-07T06:33:28.8318066Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8324238Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8327641Z #22 473.6 72 bytes stack frame, 72 bytes spill stores, 96 bytes spill loads 2025-09-07T06:33:28.8328609Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8329463Z #22 473.6 ptxas info : Compile time = 657.121 ms 2025-09-07T06:33:28.8333030Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8339424Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8343429Z #22 473.6 72 bytes stack frame, 76 bytes spill stores, 80 bytes spill loads 2025-09-07T06:33:28.8344375Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8345228Z #22 473.6 ptxas info : Compile time = 692.101 ms 2025-09-07T06:33:28.8348629Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8355980Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8359387Z #22 473.6 72 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads 2025-09-07T06:33:28.8360346Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8361501Z #22 473.6 ptxas info : Compile time = 619.080 ms 2025-09-07T06:33:28.8364781Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8370695Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8374152Z #22 473.6 40 bytes stack frame, 40 bytes spill stores, 72 bytes spill loads 2025-09-07T06:33:28.8375096Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8375950Z #22 473.6 ptxas info : Compile time = 641.387 ms 2025-09-07T06:33:28.8379233Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8385135Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8388683Z #22 473.6 40 bytes stack frame, 40 bytes spill stores, 64 bytes spill loads 2025-09-07T06:33:28.8389662Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8390491Z #22 473.6 ptxas info : Compile time = 613.649 ms 2025-09-07T06:33:28.8393902Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8400045Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8403473Z #22 473.6 40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads 2025-09-07T06:33:28.8404413Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8405261Z #22 473.6 ptxas info : Compile time = 659.722 ms 2025-09-07T06:33:28.8408633Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8415229Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8418665Z #22 473.6 40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads 2025-09-07T06:33:28.8419633Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8420473Z #22 473.6 ptxas info : Compile time = 588.939 ms 2025-09-07T06:33:28.8423913Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8430070Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8433475Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8434288Z #22 473.6 ptxas info : Used 248 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:28.8434968Z #22 473.6 ptxas info : Compile time = 529.267 ms 2025-09-07T06:33:28.8438580Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8444748Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8448164Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8449240Z #22 473.6 ptxas info : Used 240 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:28.8449932Z #22 473.6 ptxas info : Compile time = 508.713 ms 2025-09-07T06:33:28.8453798Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8459956Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8463641Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8464426Z #22 473.6 ptxas info : Used 246 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:28.8465087Z #22 473.6 ptxas info : Compile time = 548.169 ms 2025-09-07T06:33:28.8468497Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8474663Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8478078Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8478847Z #22 473.6 ptxas info : Used 246 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:28.8479536Z #22 473.6 ptxas info : Compile time = 516.422 ms 2025-09-07T06:33:28.8482930Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8490491Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8495088Z #22 473.6 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:33:28.8496349Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8497384Z #22 473.6 ptxas info : Compile time = 647.669 ms 2025-09-07T06:33:28.8501776Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8509689Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8514076Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8515350Z #22 473.6 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:28.8516235Z #22 473.6 ptxas info : Compile time = 553.598 ms 2025-09-07T06:33:28.8520703Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8528768Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8533900Z #22 473.6 24 bytes stack frame, 28 bytes spill stores, 32 bytes spill loads 2025-09-07T06:33:28.8535146Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8536326Z #22 473.6 ptxas info : Compile time = 665.321 ms 2025-09-07T06:33:28.8541214Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8549987Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8554596Z #22 473.6 24 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:33:28.8555907Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:28.8557046Z #22 473.6 ptxas info : Compile time = 620.265 ms 2025-09-07T06:33:28.8561736Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8570200Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8574685Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8575623Z #22 473.6 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:28.8576515Z #22 473.6 ptxas info : Compile time = 588.154 ms 2025-09-07T06:33:28.8580852Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8587842Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8591168Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8591960Z #22 473.6 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:28.8592636Z #22 473.6 ptxas info : Compile time = 512.206 ms 2025-09-07T06:33:28.8594419Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8597250Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:28.8598988Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8599772Z #22 473.6 ptxas info : Used 47 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:28.8600431Z #22 473.6 ptxas info : Compile time = 15.957 ms 2025-09-07T06:33:28.8604169Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8610359Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8613984Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8614759Z #22 473.6 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:28.8615450Z #22 473.6 ptxas info : Compile time = 560.350 ms 2025-09-07T06:33:28.8618611Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8624242Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:28.8627378Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8629809Z #22 473.6 ptxas info : Used 71 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:33:28.8630471Z #22 473.6 ptxas info : Compile time = 23.271 ms 2025-09-07T06:33:28.8633548Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8639041Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:28.8642165Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8642951Z #22 473.6 ptxas info : Used 47 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:33:28.8643613Z #22 473.6 ptxas info : Compile time = 13.893 ms 2025-09-07T06:33:28.8647018Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8653809Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:28.8657237Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8658000Z #22 473.6 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:28.8658680Z #22 473.6 ptxas info : Compile time = 521.937 ms 2025-09-07T06:33:28.8660393Z #22 473.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:28.8663189Z #22 473.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:28.8664947Z #22 473.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:28.8665709Z #22 473.6 ptxas info : Used 52 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:28.8666394Z #22 473.6 ptxas info : Compile time = 18.523 ms 2025-09-07T06:33:32.0518248Z #22 476.8 [27/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:32.2091413Z #22 476.8 ptxas info : 11 bytes gmem 2025-09-07T06:33:32.2096217Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2104747Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2109820Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2110754Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2111567Z #22 476.8 ptxas info : Compile time = 1.987 ms 2025-09-07T06:33:32.2116242Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2124857Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2129484Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2130417Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2131383Z #22 476.8 ptxas info : Compile time = 0.921 ms 2025-09-07T06:33:32.2136010Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2144827Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2149774Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2150700Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2151521Z #22 476.8 ptxas info : Compile time = 0.613 ms 2025-09-07T06:33:32.2156072Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2164685Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2169575Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2170555Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2171519Z #22 476.8 ptxas info : Compile time = 0.617 ms 2025-09-07T06:33:32.2176801Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2185690Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2190508Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2191524Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2192330Z #22 476.8 ptxas info : Compile time = 0.638 ms 2025-09-07T06:33:32.2197160Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2205932Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2211207Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2212194Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2212995Z #22 476.8 ptxas info : Compile time = 0.607 ms 2025-09-07T06:33:32.2217813Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2226695Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2231583Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2232549Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2233388Z #22 476.8 ptxas info : Compile time = 0.631 ms 2025-09-07T06:33:32.2238189Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2247237Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2252935Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2253893Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2254806Z #22 476.8 ptxas info : Compile time = 0.621 ms 2025-09-07T06:33:32.2259345Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2267642Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2272266Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2273193Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2274000Z #22 476.8 ptxas info : Compile time = 0.717 ms 2025-09-07T06:33:32.2278381Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2287034Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2301486Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2302510Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2303318Z #22 476.8 ptxas info : Compile time = 0.652 ms 2025-09-07T06:33:32.2308108Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2316976Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2321854Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2322843Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2323638Z #22 476.8 ptxas info : Compile time = 0.633 ms 2025-09-07T06:33:32.2328838Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2337815Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2342713Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2343682Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2344484Z #22 476.8 ptxas info : Compile time = 0.558 ms 2025-09-07T06:33:32.2349590Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2358342Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2363405Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2364383Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2365183Z #22 476.8 ptxas info : Compile time = 0.585 ms 2025-09-07T06:33:32.2369829Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2378815Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2383646Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2384609Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2385443Z #22 476.8 ptxas info : Compile time = 0.520 ms 2025-09-07T06:33:32.2390238Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2399292Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2404179Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2405159Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2405959Z #22 476.8 ptxas info : Compile time = 0.523 ms 2025-09-07T06:33:32.2410859Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2419696Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2424540Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2425528Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2426345Z #22 476.8 ptxas info : Compile time = 0.534 ms 2025-09-07T06:33:32.2431399Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2440138Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2445041Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2446048Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2446858Z #22 476.8 ptxas info : Compile time = 0.548 ms 2025-09-07T06:33:32.2452093Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2460955Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2465867Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2467116Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2467945Z #22 476.8 ptxas info : Compile time = 0.527 ms 2025-09-07T06:33:32.2472885Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2481797Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2486638Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2487608Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2488440Z #22 476.8 ptxas info : Compile time = 0.561 ms 2025-09-07T06:33:32.2493397Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2501181Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2506027Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2507021Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2507819Z #22 476.8 ptxas info : Compile time = 0.607 ms 2025-09-07T06:33:32.2512492Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2521049Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2525764Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2526724Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2527523Z #22 476.8 ptxas info : Compile time = 0.608 ms 2025-09-07T06:33:32.2532517Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2540979Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2545606Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2546582Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2547360Z #22 476.8 ptxas info : Compile time = 0.594 ms 2025-09-07T06:33:32.2550029Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2553966Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:32.2556433Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2557408Z #22 476.8 ptxas info : Used 48 registers, used 0 barriers 2025-09-07T06:33:32.2558220Z #22 476.8 ptxas info : Compile time = 42.046 ms 2025-09-07T06:33:32.2563033Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2572340Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2577243Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2578206Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2578987Z #22 476.8 ptxas info : Compile time = 1.091 ms 2025-09-07T06:33:32.2583326Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2591380Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:32.2595794Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2596773Z #22 476.8 ptxas info : Used 68 registers, used 1 barriers 2025-09-07T06:33:32.2597606Z #22 476.8 ptxas info : Compile time = 63.695 ms 2025-09-07T06:33:32.2602219Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2610011Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:32.2614605Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2615614Z #22 476.8 ptxas info : Used 44 registers, used 1 barriers 2025-09-07T06:33:32.2616440Z #22 476.8 ptxas info : Compile time = 41.405 ms 2025-09-07T06:33:32.2621055Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2629793Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2634754Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2635739Z #22 476.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:32.2636553Z #22 476.8 ptxas info : Compile time = 0.975 ms 2025-09-07T06:33:32.2638950Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.2643579Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:32.2646005Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2646998Z #22 476.8 ptxas info : Used 48 registers, used 0 barriers 2025-09-07T06:33:32.2647834Z #22 476.8 ptxas info : Compile time = 66.980 ms 2025-09-07T06:33:32.2648629Z #22 476.8 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:33:32.2653605Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2661970Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2666862Z #22 476.8 96 bytes stack frame, 96 bytes spill stores, 152 bytes spill loads 2025-09-07T06:33:32.2668523Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2669742Z #22 476.8 ptxas info : Compile time = 653.423 ms 2025-09-07T06:33:32.2674576Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2686952Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2692080Z #22 476.8 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:33:32.2693417Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2694614Z #22 476.8 ptxas info : Compile time = 600.537 ms 2025-09-07T06:33:32.2699445Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2708473Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2713333Z #22 476.8 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:33:32.2714681Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2715865Z #22 476.8 ptxas info : Compile time = 660.214 ms 2025-09-07T06:33:32.2720587Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2729337Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2734292Z #22 476.8 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:33:32.2735663Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2736872Z #22 476.8 ptxas info : Compile time = 620.536 ms 2025-09-07T06:33:32.2741874Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2750856Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2755701Z #22 476.8 80 bytes stack frame, 76 bytes spill stores, 100 bytes spill loads 2025-09-07T06:33:32.2757060Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2758250Z #22 476.8 ptxas info : Compile time = 705.485 ms 2025-09-07T06:33:32.2763065Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2772016Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2777081Z #22 476.8 72 bytes stack frame, 72 bytes spill stores, 96 bytes spill loads 2025-09-07T06:33:32.2778477Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2779662Z #22 476.8 ptxas info : Compile time = 655.719 ms 2025-09-07T06:33:32.2784288Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2793019Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2797861Z #22 476.8 72 bytes stack frame, 76 bytes spill stores, 80 bytes spill loads 2025-09-07T06:33:32.2799234Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2800447Z #22 476.8 ptxas info : Compile time = 715.929 ms 2025-09-07T06:33:32.2805252Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2814437Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2819231Z #22 476.8 72 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads 2025-09-07T06:33:32.2820595Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2821786Z #22 476.8 ptxas info : Compile time = 643.944 ms 2025-09-07T06:33:32.2826465Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2834936Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2839645Z #22 476.8 40 bytes stack frame, 40 bytes spill stores, 72 bytes spill loads 2025-09-07T06:33:32.2841174Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2842395Z #22 476.8 ptxas info : Compile time = 655.254 ms 2025-09-07T06:33:32.2847104Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2855772Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2860507Z #22 476.8 40 bytes stack frame, 40 bytes spill stores, 64 bytes spill loads 2025-09-07T06:33:32.2861881Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2863068Z #22 476.8 ptxas info : Compile time = 626.101 ms 2025-09-07T06:33:32.2867974Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2877118Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2881976Z #22 476.8 40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads 2025-09-07T06:33:32.2883298Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2884504Z #22 476.8 ptxas info : Compile time = 668.843 ms 2025-09-07T06:33:32.2889319Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2898163Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2903000Z #22 476.8 40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads 2025-09-07T06:33:32.2904378Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.2905574Z #22 476.8 ptxas info : Compile time = 631.606 ms 2025-09-07T06:33:32.2910469Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2920180Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2925055Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2926179Z #22 476.8 ptxas info : Used 248 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:32.2927147Z #22 476.8 ptxas info : Compile time = 539.736 ms 2025-09-07T06:33:32.2932188Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2941018Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2945889Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2946987Z #22 476.8 ptxas info : Used 240 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:32.2947943Z #22 476.8 ptxas info : Compile time = 524.743 ms 2025-09-07T06:33:32.2953310Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2962181Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2966965Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2968060Z #22 476.8 ptxas info : Used 246 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:32.2968984Z #22 476.8 ptxas info : Compile time = 579.883 ms 2025-09-07T06:33:32.2973875Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.2982624Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.2987700Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.2988785Z #22 476.8 ptxas info : Used 246 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:32.2989698Z #22 476.8 ptxas info : Compile time = 520.299 ms 2025-09-07T06:33:32.2994517Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3003391Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.3008261Z #22 476.8 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:33:32.3009620Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.3010851Z #22 476.8 ptxas info : Compile time = 670.626 ms 2025-09-07T06:33:32.3015948Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3024708Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.3029553Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.3030660Z #22 476.8 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:32.3031624Z #22 476.8 ptxas info : Compile time = 560.357 ms 2025-09-07T06:33:32.3036552Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3045278Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.3050313Z #22 476.8 24 bytes stack frame, 28 bytes spill stores, 32 bytes spill loads 2025-09-07T06:33:32.3051947Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.3053132Z #22 476.8 ptxas info : Compile time = 688.920 ms 2025-09-07T06:33:32.3057853Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3066576Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.3071418Z #22 476.8 24 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:33:32.3072761Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:32.3073988Z #22 476.8 ptxas info : Compile time = 645.703 ms 2025-09-07T06:33:32.3078759Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3087496Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.3092282Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.3093392Z #22 476.8 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:32.3094326Z #22 476.8 ptxas info : Compile time = 590.406 ms 2025-09-07T06:33:32.3098966Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3107480Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.3112165Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.3113282Z #22 476.8 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:32.3114245Z #22 476.8 ptxas info : Compile time = 531.887 ms 2025-09-07T06:33:32.3116656Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3120752Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:32.3123073Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.3124134Z #22 476.8 ptxas info : Used 48 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:32.3125102Z #22 476.8 ptxas info : Compile time = 20.626 ms 2025-09-07T06:33:32.3129947Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3138754Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.3143559Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.3144663Z #22 476.8 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:32.3145617Z #22 476.8 ptxas info : Compile time = 574.900 ms 2025-09-07T06:33:32.3150277Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3158476Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:32.3162908Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.3164001Z #22 476.8 ptxas info : Used 71 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:33:32.3164955Z #22 476.8 ptxas info : Compile time = 28.026 ms 2025-09-07T06:33:32.3169339Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3177367Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:32.3181655Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.3182751Z #22 476.8 ptxas info : Used 47 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:33:32.3183986Z #22 476.8 ptxas info : Compile time = 17.773 ms 2025-09-07T06:33:32.3188714Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3197359Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.3202213Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.3203310Z #22 476.8 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:32.3204275Z #22 476.8 ptxas info : Compile time = 541.925 ms 2025-09-07T06:33:32.3206724Z #22 476.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.3210703Z #22 476.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:32.3213346Z #22 476.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.3214454Z #22 476.8 ptxas info : Used 51 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:32.3215395Z #22 476.8 ptxas info : Compile time = 24.674 ms 2025-09-07T06:33:35.5350806Z #22 480.3 [28/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_packgqa_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_packgqa_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_packgqa_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:35.6908122Z #22 480.3 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:33:35.6913475Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:35.6923512Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.6928732Z #22 480.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:35.6929859Z #22 480.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:35.6930795Z #22 480.3 ptxas info : Compile time = 1.820 ms 2025-09-07T06:33:35.6936150Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:35.6946163Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.6951712Z #22 480.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:35.6952821Z #22 480.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:35.6953757Z #22 480.3 ptxas info : Compile time = 0.851 ms 2025-09-07T06:33:35.6959205Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:35.6969375Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.6975048Z #22 480.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:35.6976111Z #22 480.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:35.6977068Z #22 480.3 ptxas info : Compile time = 0.808 ms 2025-09-07T06:33:35.6982926Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:35.6993038Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.6998631Z #22 480.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:35.6999739Z #22 480.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:35.7015169Z #22 480.3 ptxas info : Compile time = 0.573 ms 2025-09-07T06:33:35.7020718Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:35.7031157Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7036705Z #22 480.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:35.7037797Z #22 480.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:33:35.7038747Z #22 480.3 ptxas info : Compile time = 0.566 ms 2025-09-07T06:33:35.7044298Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:35.7055074Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7060777Z #22 480.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:35.7061861Z #22 480.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:35.7062808Z #22 480.3 ptxas info : Compile time = 0.605 ms 2025-09-07T06:33:35.7068800Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:35.7079235Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7084929Z #22 480.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:35.7086025Z #22 480.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:35.7086984Z #22 480.3 ptxas info : Compile time = 0.612 ms 2025-09-07T06:33:35.7092522Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:35.7102779Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7108271Z #22 480.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:35.7109401Z #22 480.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:33:35.7110339Z #22 480.3 ptxas info : Compile time = 0.559 ms 2025-09-07T06:33:35.7115842Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:35.7126053Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7131810Z #22 480.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:35.7132886Z #22 480.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:35.7133850Z #22 480.3 ptxas info : Compile time = 0.537 ms 2025-09-07T06:33:35.7139738Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:35.7150234Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7159702Z #22 480.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:35.7160863Z #22 480.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:35.7161797Z #22 480.3 ptxas info : Compile time = 0.532 ms 2025-09-07T06:33:35.7162532Z #22 480.3 ptxas info : 11 bytes gmem 2025-09-07T06:33:35.7168122Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:35.7177975Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7183146Z #22 480.3 24 bytes stack frame, 52 bytes spill stores, 56 bytes spill loads 2025-09-07T06:33:35.7184394Z #22 480.3 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:33:35.7185444Z #22 480.3 ptxas info : Compile time = 660.462 ms 2025-09-07T06:33:35.7190657Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:35.7200173Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7205492Z #22 480.3 32 bytes stack frame, 100 bytes spill stores, 104 bytes spill loads 2025-09-07T06:33:35.7206746Z #22 480.3 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:35.7207817Z #22 480.3 ptxas info : Compile time = 638.257 ms 2025-09-07T06:33:35.7213899Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:35.7224137Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7229763Z #22 480.3 48 bytes stack frame, 120 bytes spill stores, 128 bytes spill loads 2025-09-07T06:33:35.7231012Z #22 480.3 ptxas info : Used 168 registers, used 9 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:35.7232085Z #22 480.3 ptxas info : Compile time = 798.375 ms 2025-09-07T06:33:35.7237638Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:35.7248091Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7254127Z #22 480.3 56 bytes stack frame, 272 bytes spill stores, 300 bytes spill loads 2025-09-07T06:33:35.7266039Z #22 480.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:33:35.7267151Z #22 480.3 ptxas info : Compile time = 1672.951 ms 2025-09-07T06:33:35.7272628Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:35.7282728Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7288274Z #22 480.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:35.7289250Z #22 480.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:35.7290086Z #22 480.3 ptxas info : Compile time = 1148.337 ms 2025-09-07T06:33:35.7296214Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:35.7306583Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7312317Z #22 480.3 48 bytes stack frame, 144 bytes spill stores, 160 bytes spill loads 2025-09-07T06:33:35.7313561Z #22 480.3 ptxas info : Used 168 registers, used 9 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:35.7314679Z #22 480.3 ptxas info : Compile time = 1340.816 ms 2025-09-07T06:33:35.7320340Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:35.7331282Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7337024Z #22 480.3 56 bytes stack frame, 160 bytes spill stores, 204 bytes spill loads 2025-09-07T06:33:35.7338279Z #22 480.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:33:35.7339361Z #22 480.3 ptxas info : Compile time = 2520.151 ms 2025-09-07T06:33:35.7344789Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:35.7354987Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7360435Z #22 480.3 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:33:35.7370113Z #22 480.3 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:33:35.7371531Z #22 480.3 ptxas info : Compile time = 987.581 ms 2025-09-07T06:33:35.7377121Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:35.7387298Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7393046Z #22 480.3 32 bytes stack frame, 88 bytes spill stores, 104 bytes spill loads 2025-09-07T06:33:35.7394305Z #22 480.3 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:35.7395405Z #22 480.3 ptxas info : Compile time = 1134.689 ms 2025-09-07T06:33:35.7401032Z #22 480.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:35.7411866Z #22 480.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:35.7418124Z #22 480.3 40 bytes stack frame, 264 bytes spill stores, 296 bytes spill loads 2025-09-07T06:33:35.7419405Z #22 480.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:33:35.7420453Z #22 480.3 ptxas info : Compile time = 2215.476 ms 2025-09-07T06:33:44.4983935Z #22 489.3 [29/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:44.6501499Z #22 489.3 ptxas info : 11 bytes gmem 2025-09-07T06:33:44.6506597Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6515461Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6520971Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6522010Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6522856Z #22 489.3 ptxas info : Compile time = 2.022 ms 2025-09-07T06:33:44.6528102Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6537439Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6542439Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6543511Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6544345Z #22 489.3 ptxas info : Compile time = 0.951 ms 2025-09-07T06:33:44.6549610Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6558424Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6563791Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6564793Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6565666Z #22 489.3 ptxas info : Compile time = 0.590 ms 2025-09-07T06:33:44.6570663Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6579633Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6584673Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6585715Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6586564Z #22 489.3 ptxas info : Compile time = 0.540 ms 2025-09-07T06:33:44.6591554Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6600836Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6605900Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6606951Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6607790Z #22 489.3 ptxas info : Compile time = 0.527 ms 2025-09-07T06:33:44.6612937Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6621692Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6626701Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6627712Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6628562Z #22 489.3 ptxas info : Compile time = 0.552 ms 2025-09-07T06:33:44.6633401Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6642709Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6647407Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6648236Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6649161Z #22 489.3 ptxas info : Compile time = 0.546 ms 2025-09-07T06:33:44.6653970Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6662824Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6667705Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6668744Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6669543Z #22 489.3 ptxas info : Compile time = 0.537 ms 2025-09-07T06:33:44.6674462Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6683200Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6688060Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6689089Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6689939Z #22 489.3 ptxas info : Compile time = 0.522 ms 2025-09-07T06:33:44.6694899Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6703607Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6708723Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6709784Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6710625Z #22 489.3 ptxas info : Compile time = 0.576 ms 2025-09-07T06:33:44.6713074Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6716996Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:44.6719463Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6720499Z #22 489.3 ptxas info : Used 39 registers, used 0 barriers 2025-09-07T06:33:44.6721362Z #22 489.3 ptxas info : Compile time = 19.375 ms 2025-09-07T06:33:44.6726389Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6735628Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6740509Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6741511Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6742357Z #22 489.3 ptxas info : Compile time = 0.879 ms 2025-09-07T06:33:44.6747024Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6754362Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:44.6758799Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6759836Z #22 489.3 ptxas info : Used 40 registers, used 1 barriers 2025-09-07T06:33:44.6760590Z #22 489.3 ptxas info : Compile time = 17.143 ms 2025-09-07T06:33:44.6765330Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6774338Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6779536Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6780541Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6781939Z #22 489.3 ptxas info : Compile time = 0.992 ms 2025-09-07T06:33:44.6784093Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6787784Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:44.6790143Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6791169Z #22 489.3 ptxas info : Used 43 registers, used 0 barriers 2025-09-07T06:33:44.6792015Z #22 489.3 ptxas info : Compile time = 27.109 ms 2025-09-07T06:33:44.6796974Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6805822Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6810678Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6812273Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6813085Z #22 489.3 ptxas info : Compile time = 0.987 ms 2025-09-07T06:33:44.6818091Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6826964Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6831884Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6832928Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6833756Z #22 489.3 ptxas info : Compile time = 0.808 ms 2025-09-07T06:33:44.6838512Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6847448Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6852603Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6853657Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6854457Z #22 489.3 ptxas info : Compile time = 0.788 ms 2025-09-07T06:33:44.6859312Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6868465Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6873422Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6874475Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6875364Z #22 489.3 ptxas info : Compile time = 0.699 ms 2025-09-07T06:33:44.6880571Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6889342Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6894472Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6895485Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6896339Z #22 489.3 ptxas info : Compile time = 0.637 ms 2025-09-07T06:33:44.6901236Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6910242Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6915168Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6916478Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6917475Z #22 489.3 ptxas info : Compile time = 0.630 ms 2025-09-07T06:33:44.6922238Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6931350Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6936317Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6937345Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6938216Z #22 489.3 ptxas info : Compile time = 0.656 ms 2025-09-07T06:33:44.6943087Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6952714Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6957916Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6958949Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6959742Z #22 489.3 ptxas info : Compile time = 0.529 ms 2025-09-07T06:33:44.6964472Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6973022Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6977904Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6978905Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6979768Z #22 489.3 ptxas info : Compile time = 0.533 ms 2025-09-07T06:33:44.6984338Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.6993227Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.6998061Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.6999081Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.6999902Z #22 489.3 ptxas info : Compile time = 0.577 ms 2025-09-07T06:33:44.7002480Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.7006738Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:44.7009370Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7010425Z #22 489.3 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:33:44.7011428Z #22 489.3 ptxas info : Compile time = 29.569 ms 2025-09-07T06:33:44.7016492Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.7025788Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7030597Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7031558Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.7033132Z #22 489.3 ptxas info : Compile time = 0.887 ms 2025-09-07T06:33:44.7037483Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.7045853Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:44.7050487Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7051628Z #22 489.3 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:33:44.7052499Z #22 489.3 ptxas info : Compile time = 22.948 ms 2025-09-07T06:33:44.7057418Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.7066525Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7071357Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7072399Z #22 489.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:44.7073242Z #22 489.3 ptxas info : Compile time = 0.890 ms 2025-09-07T06:33:44.7075781Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:44.7079831Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:44.7082392Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7083429Z #22 489.3 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:33:44.7084272Z #22 489.3 ptxas info : Compile time = 75.207 ms 2025-09-07T06:33:44.7085100Z #22 489.3 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:33:44.7090125Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7099550Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7104381Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7105510Z #22 489.3 ptxas info : Used 249 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7106512Z #22 489.3 ptxas info : Compile time = 417.845 ms 2025-09-07T06:33:44.7111396Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7120149Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7125076Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7126259Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7127402Z #22 489.3 ptxas info : Compile time = 380.967 ms 2025-09-07T06:33:44.7132546Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7141310Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7146296Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7147434Z #22 489.3 ptxas info : Used 252 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7148441Z #22 489.3 ptxas info : Compile time = 450.471 ms 2025-09-07T06:33:44.7153474Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7162463Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7167415Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7168811Z #22 489.3 ptxas info : Used 252 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7169829Z #22 489.3 ptxas info : Compile time = 413.255 ms 2025-09-07T06:33:44.7174840Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7183870Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7188780Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7189857Z #22 489.3 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7190829Z #22 489.3 ptxas info : Compile time = 480.900 ms 2025-09-07T06:33:44.7195785Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7204944Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7209985Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7211306Z #22 489.3 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7212309Z #22 489.3 ptxas info : Compile time = 423.849 ms 2025-09-07T06:33:44.7217278Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7226303Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7231108Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7232274Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7233274Z #22 489.3 ptxas info : Compile time = 468.930 ms 2025-09-07T06:33:44.7238261Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7247161Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7252517Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7253669Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7254643Z #22 489.3 ptxas info : Compile time = 424.649 ms 2025-09-07T06:33:44.7259382Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7268163Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7273234Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7274410Z #22 489.3 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7275334Z #22 489.3 ptxas info : Compile time = 433.234 ms 2025-09-07T06:33:44.7280006Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7289233Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7294116Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7295319Z #22 489.3 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7296330Z #22 489.3 ptxas info : Compile time = 390.741 ms 2025-09-07T06:33:44.7298742Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7302651Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:44.7305247Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7306423Z #22 489.3 ptxas info : Used 39 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:44.7307497Z #22 489.3 ptxas info : Compile time = 20.442 ms 2025-09-07T06:33:44.7312823Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7321916Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7326935Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7328083Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7329081Z #22 489.3 ptxas info : Compile time = 469.952 ms 2025-09-07T06:33:44.7333712Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7341765Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:44.7346481Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7347659Z #22 489.3 ptxas info : Used 40 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:33:44.7348641Z #22 489.3 ptxas info : Compile time = 16.049 ms 2025-09-07T06:33:44.7353598Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7362400Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7367291Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7368429Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:44.7369427Z #22 489.3 ptxas info : Compile time = 401.902 ms 2025-09-07T06:33:44.7372013Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7375924Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:44.7378336Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7379471Z #22 489.3 ptxas info : Used 45 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:44.7380777Z #22 489.3 ptxas info : Compile time = 19.597 ms 2025-09-07T06:33:44.7385739Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7394407Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7399256Z #22 489.3 32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:33:44.7400697Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7401909Z #22 489.3 ptxas info : Compile time = 721.347 ms 2025-09-07T06:33:44.7406557Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7413871Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7417869Z #22 489.3 64 bytes stack frame, 56 bytes spill stores, 72 bytes spill loads 2025-09-07T06:33:44.7419198Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7420447Z #22 489.3 ptxas info : Compile time = 714.744 ms 2025-09-07T06:33:44.7425320Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7434280Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7439213Z #22 489.3 72 bytes stack frame, 68 bytes spill stores, 72 bytes spill loads 2025-09-07T06:33:44.7440625Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7441839Z #22 489.3 ptxas info : Compile time = 742.926 ms 2025-09-07T06:33:44.7446952Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7455531Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7460020Z #22 489.3 64 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:33:44.7461458Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7462696Z #22 489.3 ptxas info : Compile time = 708.925 ms 2025-09-07T06:33:44.7467518Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7476358Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7481674Z #22 489.3 88 bytes stack frame, 92 bytes spill stores, 124 bytes spill loads 2025-09-07T06:33:44.7483066Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7484352Z #22 489.3 ptxas info : Compile time = 812.269 ms 2025-09-07T06:33:44.7488963Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7497598Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7502446Z #22 489.3 88 bytes stack frame, 84 bytes spill stores, 112 bytes spill loads 2025-09-07T06:33:44.7503881Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7505086Z #22 489.3 ptxas info : Compile time = 752.361 ms 2025-09-07T06:33:44.7509986Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7519365Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7524372Z #22 489.3 104 bytes stack frame, 104 bytes spill stores, 120 bytes spill loads 2025-09-07T06:33:44.7525821Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7527101Z #22 489.3 ptxas info : Compile time = 826.389 ms 2025-09-07T06:33:44.7532177Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7541777Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7546747Z #22 489.3 104 bytes stack frame, 104 bytes spill stores, 116 bytes spill loads 2025-09-07T06:33:44.7548228Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7549826Z #22 489.3 ptxas info : Compile time = 793.239 ms 2025-09-07T06:33:44.7554567Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7562656Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7567328Z #22 489.3 88 bytes stack frame, 88 bytes spill stores, 120 bytes spill loads 2025-09-07T06:33:44.7568760Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7570026Z #22 489.3 ptxas info : Compile time = 755.215 ms 2025-09-07T06:33:44.7574912Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7583393Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7587746Z #22 489.3 96 bytes stack frame, 96 bytes spill stores, 120 bytes spill loads 2025-09-07T06:33:44.7589396Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7590628Z #22 489.3 ptxas info : Compile time = 714.946 ms 2025-09-07T06:33:44.7593048Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7597019Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:44.7599534Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7600647Z #22 489.3 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:44.7601575Z #22 489.3 ptxas info : Compile time = 30.579 ms 2025-09-07T06:33:44.7606520Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7615303Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7620516Z #22 489.3 96 bytes stack frame, 96 bytes spill stores, 100 bytes spill loads 2025-09-07T06:33:44.7621917Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7623058Z #22 489.3 ptxas info : Compile time = 763.142 ms 2025-09-07T06:33:44.7627467Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7635227Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:44.7639720Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7640820Z #22 489.3 ptxas info : Used 56 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:33:44.7641791Z #22 489.3 ptxas info : Compile time = 21.941 ms 2025-09-07T06:33:44.7646366Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7655548Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:44.7660382Z #22 489.3 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads 2025-09-07T06:33:44.7661749Z #22 489.3 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:44.7662887Z #22 489.3 ptxas info : Compile time = 736.603 ms 2025-09-07T06:33:44.7665306Z #22 489.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:44.7669314Z #22 489.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:44.7671824Z #22 489.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:44.7672935Z #22 489.3 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:44.7673887Z #22 489.3 ptxas info : Compile time = 32.969 ms 2025-09-07T06:33:49.0182150Z #22 493.8 [30/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:49.1748430Z #22 493.8 ptxas info : 11 bytes gmem 2025-09-07T06:33:49.1753890Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1763106Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.1768071Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1769016Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.1769802Z #22 493.8 ptxas info : Compile time = 2.251 ms 2025-09-07T06:33:49.1774739Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1783320Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.1788151Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1789110Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.1789896Z #22 493.8 ptxas info : Compile time = 1.081 ms 2025-09-07T06:33:49.1794627Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1803774Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.1808423Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1809340Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.1810128Z #22 493.8 ptxas info : Compile time = 0.716 ms 2025-09-07T06:33:49.1815085Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1823689Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.1828440Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1829368Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.1830186Z #22 493.8 ptxas info : Compile time = 0.638 ms 2025-09-07T06:33:49.1835216Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1843793Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.1848679Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1850171Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.1850984Z #22 493.8 ptxas info : Compile time = 0.642 ms 2025-09-07T06:33:49.1855906Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1864392Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.1869821Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1870760Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.1871549Z #22 493.8 ptxas info : Compile time = 0.631 ms 2025-09-07T06:33:49.1876193Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1884787Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.1889663Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1890638Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.1891631Z #22 493.8 ptxas info : Compile time = 0.681 ms 2025-09-07T06:33:49.1896230Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1905112Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.1909879Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1910837Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.1911631Z #22 493.8 ptxas info : Compile time = 0.605 ms 2025-09-07T06:33:49.1916067Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1924496Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.1929141Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1930041Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.1930819Z #22 493.8 ptxas info : Compile time = 0.648 ms 2025-09-07T06:33:49.1935823Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1944186Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.1949130Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1950139Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.1950956Z #22 493.8 ptxas info : Compile time = 0.542 ms 2025-09-07T06:33:49.1953278Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1957173Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:49.1959580Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1960489Z #22 493.8 ptxas info : Used 39 registers, used 0 barriers 2025-09-07T06:33:49.1961272Z #22 493.8 ptxas info : Compile time = 20.482 ms 2025-09-07T06:33:49.1966062Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1975776Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.1980613Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1981538Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.1982341Z #22 493.8 ptxas info : Compile time = 0.978 ms 2025-09-07T06:33:49.1986615Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.1994239Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:49.1998511Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.1999456Z #22 493.8 ptxas info : Used 40 registers, used 1 barriers 2025-09-07T06:33:49.2000641Z #22 493.8 ptxas info : Compile time = 19.587 ms 2025-09-07T06:33:49.2005265Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2014042Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2018842Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2019774Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2020532Z #22 493.8 ptxas info : Compile time = 0.871 ms 2025-09-07T06:33:49.2022940Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2026804Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:49.2029232Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2030206Z #22 493.8 ptxas info : Used 42 registers, used 0 barriers 2025-09-07T06:33:49.2031000Z #22 493.8 ptxas info : Compile time = 23.261 ms 2025-09-07T06:33:49.2035936Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2044612Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2049667Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2050608Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2051548Z #22 493.8 ptxas info : Compile time = 1.004 ms 2025-09-07T06:33:49.2056256Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2064831Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2070102Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2071021Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2071844Z #22 493.8 ptxas info : Compile time = 0.804 ms 2025-09-07T06:33:49.2076612Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2085172Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2089864Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2090827Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2091803Z #22 493.8 ptxas info : Compile time = 0.763 ms 2025-09-07T06:33:49.2096508Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2105527Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2110299Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2111258Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2112041Z #22 493.8 ptxas info : Compile time = 0.589 ms 2025-09-07T06:33:49.2116751Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2125337Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2130197Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2131347Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2132082Z #22 493.8 ptxas info : Compile time = 0.546 ms 2025-09-07T06:33:49.2136814Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2145655Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2150608Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2151596Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2152398Z #22 493.8 ptxas info : Compile time = 0.526 ms 2025-09-07T06:33:49.2157084Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2165695Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2170530Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2171667Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2172485Z #22 493.8 ptxas info : Compile time = 0.537 ms 2025-09-07T06:33:49.2177521Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2186109Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2190932Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2191875Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2192670Z #22 493.8 ptxas info : Compile time = 0.524 ms 2025-09-07T06:33:49.2197290Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2205547Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2210579Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2211731Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2212510Z #22 493.8 ptxas info : Compile time = 0.553 ms 2025-09-07T06:33:49.2217048Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2225834Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2230494Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2231441Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2232244Z #22 493.8 ptxas info : Compile time = 0.589 ms 2025-09-07T06:33:49.2234721Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2238735Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:49.2241296Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2242267Z #22 493.8 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:33:49.2243322Z #22 493.8 ptxas info : Compile time = 34.804 ms 2025-09-07T06:33:49.2248112Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2257096Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2261907Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2262880Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2263634Z #22 493.8 ptxas info : Compile time = 1.010 ms 2025-09-07T06:33:49.2267956Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2275888Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:49.2280646Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2281603Z #22 493.8 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:33:49.2282412Z #22 493.8 ptxas info : Compile time = 48.723 ms 2025-09-07T06:33:49.2288127Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2297071Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2301964Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2302891Z #22 493.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:49.2303687Z #22 493.8 ptxas info : Compile time = 1.022 ms 2025-09-07T06:33:49.2306107Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:49.2310293Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:49.2313205Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2314156Z #22 493.8 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:33:49.2314955Z #22 493.8 ptxas info : Compile time = 74.486 ms 2025-09-07T06:33:49.2315653Z #22 493.8 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:33:49.2320484Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2329187Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2334147Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2335221Z #22 493.8 ptxas info : Used 249 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2336163Z #22 493.8 ptxas info : Compile time = 395.984 ms 2025-09-07T06:33:49.2340882Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2350121Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2354752Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2355852Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2356791Z #22 493.8 ptxas info : Compile time = 372.202 ms 2025-09-07T06:33:49.2361435Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2370059Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2374940Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2376019Z #22 493.8 ptxas info : Used 252 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2376967Z #22 493.8 ptxas info : Compile time = 430.566 ms 2025-09-07T06:33:49.2382010Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2390587Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2394499Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2395273Z #22 493.8 ptxas info : Used 252 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2395948Z #22 493.8 ptxas info : Compile time = 388.660 ms 2025-09-07T06:33:49.2399245Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2406811Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2412136Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2413214Z #22 493.8 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2414132Z #22 493.8 ptxas info : Compile time = 451.297 ms 2025-09-07T06:33:49.2418867Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2427389Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2432152Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2433190Z #22 493.8 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2433864Z #22 493.8 ptxas info : Compile time = 428.197 ms 2025-09-07T06:33:49.2437131Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2444169Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2449565Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2450678Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2451766Z #22 493.8 ptxas info : Compile time = 455.426 ms 2025-09-07T06:33:49.2456519Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2465227Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2470009Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2470812Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2471876Z #22 493.8 ptxas info : Compile time = 421.078 ms 2025-09-07T06:33:49.2475083Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2482591Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2487287Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2488298Z #22 493.8 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2489230Z #22 493.8 ptxas info : Compile time = 433.106 ms 2025-09-07T06:33:49.2494000Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2502271Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2506896Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2508390Z #22 493.8 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2509373Z #22 493.8 ptxas info : Compile time = 390.805 ms 2025-09-07T06:33:49.2511224Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2513862Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:49.2515531Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2516308Z #22 493.8 ptxas info : Used 39 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:49.2517071Z #22 493.8 ptxas info : Compile time = 18.509 ms 2025-09-07T06:33:49.2521024Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2529363Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2534615Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2535686Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2536637Z #22 493.8 ptxas info : Compile time = 434.879 ms 2025-09-07T06:33:49.2540948Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2548334Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:49.2551647Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2552435Z #22 493.8 ptxas info : Used 40 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:33:49.2553094Z #22 493.8 ptxas info : Compile time = 12.798 ms 2025-09-07T06:33:49.2556535Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2565237Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2569924Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2570919Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:33:49.2572047Z #22 493.8 ptxas info : Compile time = 389.761 ms 2025-09-07T06:33:49.2574400Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2578220Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:49.2580703Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2581788Z #22 493.8 ptxas info : Used 45 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:49.2582726Z #22 493.8 ptxas info : Compile time = 20.346 ms 2025-09-07T06:33:49.2587491Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2594180Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2598043Z #22 493.8 32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:33:49.2599129Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2600397Z #22 493.8 ptxas info : Compile time = 714.741 ms 2025-09-07T06:33:49.2604946Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2613885Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2618725Z #22 493.8 64 bytes stack frame, 56 bytes spill stores, 72 bytes spill loads 2025-09-07T06:33:49.2620068Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2621259Z #22 493.8 ptxas info : Compile time = 699.188 ms 2025-09-07T06:33:49.2626334Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2632457Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2636395Z #22 493.8 72 bytes stack frame, 68 bytes spill stores, 72 bytes spill loads 2025-09-07T06:33:49.2637678Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2638868Z #22 493.8 ptxas info : Compile time = 759.947 ms 2025-09-07T06:33:49.2643685Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2652598Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2657729Z #22 493.8 64 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:33:49.2659269Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2660482Z #22 493.8 ptxas info : Compile time = 714.880 ms 2025-09-07T06:33:49.2665164Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2671753Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2675265Z #22 493.8 88 bytes stack frame, 92 bytes spill stores, 124 bytes spill loads 2025-09-07T06:33:49.2676497Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2677652Z #22 493.8 ptxas info : Compile time = 803.190 ms 2025-09-07T06:33:49.2682305Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2691447Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2696371Z #22 493.8 88 bytes stack frame, 84 bytes spill stores, 112 bytes spill loads 2025-09-07T06:33:49.2697698Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2698878Z #22 493.8 ptxas info : Compile time = 767.056 ms 2025-09-07T06:33:49.2703455Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2709571Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2713781Z #22 493.8 104 bytes stack frame, 104 bytes spill stores, 120 bytes spill loads 2025-09-07T06:33:49.2715178Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2716362Z #22 493.8 ptxas info : Compile time = 845.883 ms 2025-09-07T06:33:49.2721290Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2730579Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2735603Z #22 493.8 104 bytes stack frame, 104 bytes spill stores, 116 bytes spill loads 2025-09-07T06:33:49.2736993Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2738190Z #22 493.8 ptxas info : Compile time = 801.627 ms 2025-09-07T06:33:49.2742644Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2748398Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2752016Z #22 493.8 88 bytes stack frame, 88 bytes spill stores, 120 bytes spill loads 2025-09-07T06:33:49.2755108Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2756330Z #22 493.8 ptxas info : Compile time = 753.368 ms 2025-09-07T06:33:49.2760797Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2768986Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2773911Z #22 493.8 96 bytes stack frame, 96 bytes spill stores, 120 bytes spill loads 2025-09-07T06:33:49.2775236Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2776422Z #22 493.8 ptxas info : Compile time = 695.826 ms 2025-09-07T06:33:49.2778851Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2782568Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:49.2784622Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2785400Z #22 493.8 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:49.2786082Z #22 493.8 ptxas info : Compile time = 27.120 ms 2025-09-07T06:33:49.2789494Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2797242Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2802033Z #22 493.8 96 bytes stack frame, 96 bytes spill stores, 100 bytes spill loads 2025-09-07T06:33:49.2803348Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2804496Z #22 493.8 ptxas info : Compile time = 751.247 ms 2025-09-07T06:33:49.2808858Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2817166Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:49.2821588Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2822686Z #22 493.8 ptxas info : Used 56 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:33:49.2823490Z #22 493.8 ptxas info : Compile time = 17.813 ms 2025-09-07T06:33:49.2826732Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2833079Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:49.2837845Z #22 493.8 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads 2025-09-07T06:33:49.2839065Z #22 493.8 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:49.2840261Z #22 493.8 ptxas info : Compile time = 699.180 ms 2025-09-07T06:33:49.2842634Z #22 493.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:49.2847006Z #22 493.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:49.2849877Z #22 493.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:49.2850928Z #22 493.8 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:49.2852022Z #22 493.8 ptxas info : Compile time = 30.473 ms 2025-09-07T06:33:54.4733995Z #22 499.3 [31/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:54.6270306Z #22 499.3 ptxas info : 11 bytes gmem 2025-09-07T06:33:54.6274785Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6283797Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6288125Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6289035Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6289735Z #22 499.3 ptxas info : Compile time = 2.171 ms 2025-09-07T06:33:54.6293899Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6301698Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6306049Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6307069Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6307911Z #22 499.3 ptxas info : Compile time = 1.112 ms 2025-09-07T06:33:54.6312274Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6320931Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6325818Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6415282Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6416481Z #22 499.3 ptxas info : Compile time = 21.199 ms 2025-09-07T06:33:54.6421102Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6429928Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6434771Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6435678Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6436332Z #22 499.3 ptxas info : Compile time = 0.820 ms 2025-09-07T06:33:54.6440448Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6448412Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6540437Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6541468Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6542292Z #22 499.3 ptxas info : Compile time = 0.653 ms 2025-09-07T06:33:54.6547698Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6557234Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6562549Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6563597Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6564408Z #22 499.3 ptxas info : Compile time = 0.552 ms 2025-09-07T06:33:54.6569602Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6578516Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6583243Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6584154Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6584939Z #22 499.3 ptxas info : Compile time = 0.518 ms 2025-09-07T06:33:54.6590134Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6598845Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6603370Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6604135Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6604759Z #22 499.3 ptxas info : Compile time = 0.567 ms 2025-09-07T06:33:54.6608860Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6617752Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6622555Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6623483Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6624309Z #22 499.3 ptxas info : Compile time = 0.543 ms 2025-09-07T06:33:54.6629709Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6638994Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6643996Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6644984Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6645787Z #22 499.3 ptxas info : Compile time = 0.533 ms 2025-09-07T06:33:54.6651683Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6661338Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6667019Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6667988Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6668740Z #22 499.3 ptxas info : Compile time = 0.511 ms 2025-09-07T06:33:54.6673816Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.6683398Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6688781Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6689749Z #22 499.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.6690501Z #22 499.3 ptxas info : Compile time = 0.512 ms 2025-09-07T06:33:54.6691379Z #22 499.3 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:33:54.6696000Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6705214Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6709515Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6710574Z #22 499.3 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:33:54.6711471Z #22 499.3 ptxas info : Compile time = 674.852 ms 2025-09-07T06:33:54.6716292Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6725291Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6730289Z #22 499.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.6731561Z #22 499.3 ptxas info : Used 250 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:33:54.6732569Z #22 499.3 ptxas info : Compile time = 992.951 ms 2025-09-07T06:33:54.6737273Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6745894Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6751047Z #22 499.3 96 bytes stack frame, 124 bytes spill stores, 204 bytes spill loads 2025-09-07T06:33:54.6752366Z #22 499.3 ptxas info : Used 255 registers, used 6 barriers, 96 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:33:54.6753399Z #22 499.3 ptxas info : Compile time = 1204.223 ms 2025-09-07T06:33:54.6758014Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6765327Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6769248Z #22 499.3 144 bytes stack frame, 156 bytes spill stores, 180 bytes spill loads 2025-09-07T06:33:54.6770309Z #22 499.3 ptxas info : Used 255 registers, used 2 barriers, 144 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:33:54.6771414Z #22 499.3 ptxas info : Compile time = 718.584 ms 2025-09-07T06:33:54.6775739Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6784395Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6788783Z #22 499.3 56 bytes stack frame, 72 bytes spill stores, 76 bytes spill loads 2025-09-07T06:33:54.6790163Z #22 499.3 ptxas info : Used 255 registers, used 6 barriers, 56 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.6791229Z #22 499.3 ptxas info : Compile time = 959.456 ms 2025-09-07T06:33:54.6795555Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6803992Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6808657Z #22 499.3 48 bytes stack frame, 60 bytes spill stores, 64 bytes spill loads 2025-09-07T06:33:54.6809922Z #22 499.3 ptxas info : Used 255 registers, used 6 barriers, 48 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.6811224Z #22 499.3 ptxas info : Compile time = 1698.701 ms 2025-09-07T06:33:54.6815704Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6824256Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6828873Z #22 499.3 112 bytes stack frame, 160 bytes spill stores, 228 bytes spill loads 2025-09-07T06:33:54.6830186Z #22 499.3 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:33:54.6831325Z #22 499.3 ptxas info : Compile time = 1431.137 ms 2025-09-07T06:33:54.6836098Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6844701Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6849776Z #22 499.3 112 bytes stack frame, 220 bytes spill stores, 316 bytes spill loads 2025-09-07T06:33:54.6850909Z #22 499.3 ptxas info : Used 255 registers, used 6 barriers, 112 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.6852467Z #22 499.3 ptxas info : Compile time = 1784.724 ms 2025-09-07T06:33:54.6856776Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6865246Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6869805Z #22 499.3 104 bytes stack frame, 220 bytes spill stores, 368 bytes spill loads 2025-09-07T06:33:54.6871098Z #22 499.3 ptxas info : Used 255 registers, used 6 barriers, 104 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.6872256Z #22 499.3 ptxas info : Compile time = 3063.142 ms 2025-09-07T06:33:54.6877280Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6885867Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6889987Z #22 499.3 144 bytes stack frame, 232 bytes spill stores, 448 bytes spill loads 2025-09-07T06:33:54.6891181Z #22 499.3 ptxas info : Used 255 registers, used 6 barriers, 144 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:33:54.6892116Z #22 499.3 ptxas info : Compile time = 1353.961 ms 2025-09-07T06:33:54.6896316Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6905123Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6909662Z #22 499.3 144 bytes stack frame, 248 bytes spill stores, 440 bytes spill loads 2025-09-07T06:33:54.6910806Z #22 499.3 ptxas info : Used 255 registers, used 6 barriers, 144 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.6912068Z #22 499.3 ptxas info : Compile time = 2253.214 ms 2025-09-07T06:33:54.6916434Z #22 499.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.6924915Z #22 499.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.6929628Z #22 499.3 136 bytes stack frame, 240 bytes spill stores, 320 bytes spill loads 2025-09-07T06:33:54.6930859Z #22 499.3 ptxas info : Used 255 registers, used 6 barriers, 136 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.6931969Z #22 499.3 ptxas info : Compile time = 4367.994 ms 2025-09-07T06:33:54.7473334Z #22 499.5 [32/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:54.9017227Z #22 499.5 ptxas info : 11 bytes gmem 2025-09-07T06:33:54.9022106Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9031423Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9036393Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9037367Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9038211Z #22 499.5 ptxas info : Compile time = 2.035 ms 2025-09-07T06:33:54.9043356Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9053064Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9058202Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9059183Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9060003Z #22 499.5 ptxas info : Compile time = 21.485 ms 2025-09-07T06:33:54.9065466Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9074611Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9079864Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9080824Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9081670Z #22 499.5 ptxas info : Compile time = 1.204 ms 2025-09-07T06:33:54.9086842Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9096369Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9101912Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9102929Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9103738Z #22 499.5 ptxas info : Compile time = 0.818 ms 2025-09-07T06:33:54.9109269Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9119515Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9125130Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9126124Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9126975Z #22 499.5 ptxas info : Compile time = 0.800 ms 2025-09-07T06:33:54.9132356Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9142650Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9148087Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9151647Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9152471Z #22 499.5 ptxas info : Compile time = 0.754 ms 2025-09-07T06:33:54.9157589Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9166977Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9172492Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9173521Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9174336Z #22 499.5 ptxas info : Compile time = 0.682 ms 2025-09-07T06:33:54.9179864Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9190096Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9195715Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9196717Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9197534Z #22 499.5 ptxas info : Compile time = 0.732 ms 2025-09-07T06:33:54.9203008Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9213643Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9219209Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9220218Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9220947Z #22 499.5 ptxas info : Compile time = 0.671 ms 2025-09-07T06:33:54.9253510Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9263659Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9269119Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9270345Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9271287Z #22 499.5 ptxas info : Compile time = 0.666 ms 2025-09-07T06:33:54.9276715Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9286537Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9292260Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9293282Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9294081Z #22 499.5 ptxas info : Compile time = 0.638 ms 2025-09-07T06:33:54.9299193Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:54.9308140Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9313491Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9314415Z #22 499.5 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:54.9315207Z #22 499.5 ptxas info : Compile time = 0.636 ms 2025-09-07T06:33:54.9315941Z #22 499.5 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:33:54.9320727Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9328752Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9333237Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9334284Z #22 499.5 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:33:54.9335186Z #22 499.5 ptxas info : Compile time = 1036.577 ms 2025-09-07T06:33:54.9340089Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9348238Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9352944Z #22 499.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:54.9353909Z #22 499.5 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:33:54.9354739Z #22 499.5 ptxas info : Compile time = 1902.127 ms 2025-09-07T06:33:54.9359075Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9367201Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9371918Z #22 499.5 216 bytes stack frame, 232 bytes spill stores, 332 bytes spill loads 2025-09-07T06:33:54.9373192Z #22 499.5 ptxas info : Used 255 registers, used 6 barriers, 216 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:33:54.9374216Z #22 499.5 ptxas info : Compile time = 2323.307 ms 2025-09-07T06:33:54.9378496Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9386490Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9390828Z #22 499.5 40 bytes stack frame, 52 bytes spill stores, 68 bytes spill loads 2025-09-07T06:33:54.9391999Z #22 499.5 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:33:54.9393072Z #22 499.5 ptxas info : Compile time = 1265.383 ms 2025-09-07T06:33:54.9398031Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9407392Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9412359Z #22 499.5 176 bytes stack frame, 192 bytes spill stores, 324 bytes spill loads 2025-09-07T06:33:54.9414256Z #22 499.5 ptxas info : Used 255 registers, used 6 barriers, 176 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.9415468Z #22 499.5 ptxas info : Compile time = 1645.031 ms 2025-09-07T06:33:54.9420117Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9428356Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9433939Z #22 499.5 248 bytes stack frame, 324 bytes spill stores, 528 bytes spill loads 2025-09-07T06:33:54.9435264Z #22 499.5 ptxas info : Used 255 registers, used 6 barriers, 248 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.9436394Z #22 499.5 ptxas info : Compile time = 3142.013 ms 2025-09-07T06:33:54.9441282Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9449237Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9457998Z #22 499.5 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:33:54.9459459Z #22 499.5 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:33:54.9460719Z #22 499.5 ptxas info : Compile time = 2518.916 ms 2025-09-07T06:33:54.9466448Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9477225Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9482808Z #22 499.5 48 bytes stack frame, 64 bytes spill stores, 68 bytes spill loads 2025-09-07T06:33:54.9484216Z #22 499.5 ptxas info : Used 255 registers, used 6 barriers, 48 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.9485445Z #22 499.5 ptxas info : Compile time = 3210.799 ms 2025-09-07T06:33:54.9491356Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9501865Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9507933Z #22 499.5 112 bytes stack frame, 188 bytes spill stores, 280 bytes spill loads 2025-09-07T06:33:54.9509383Z #22 499.5 ptxas info : Used 255 registers, used 6 barriers, 112 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.9510677Z #22 499.5 ptxas info : Compile time = 5415.506 ms 2025-09-07T06:33:54.9515956Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9526114Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9531901Z #22 499.5 272 bytes stack frame, 572 bytes spill stores, 700 bytes spill loads 2025-09-07T06:33:54.9533387Z #22 499.5 ptxas info : Used 255 registers, used 6 barriers, 272 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:33:54.9534684Z #22 499.5 ptxas info : Compile time = 2525.417 ms 2025-09-07T06:33:54.9540503Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9551297Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9557063Z #22 499.5 256 bytes stack frame, 360 bytes spill stores, 636 bytes spill loads 2025-09-07T06:33:54.9558547Z #22 499.5 ptxas info : Used 255 registers, used 6 barriers, 256 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.9559798Z #22 499.5 ptxas info : Compile time = 2515.970 ms 2025-09-07T06:33:54.9565257Z #22 499.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:54.9575796Z #22 499.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:54.9581893Z #22 499.5 288 bytes stack frame, 412 bytes spill stores, 748 bytes spill loads 2025-09-07T06:33:54.9583387Z #22 499.5 ptxas info : Used 255 registers, used 6 barriers, 288 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:54.9584648Z #22 499.5 ptxas info : Compile time = 4980.298 ms 2025-09-07T06:33:55.0747580Z #22 499.9 [33/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:55.2295024Z #22 499.9 ptxas info : 11 bytes gmem 2025-09-07T06:33:55.2299432Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2308283Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2313079Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2314035Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2314886Z #22 499.9 ptxas info : Compile time = 2.149 ms 2025-09-07T06:33:55.2320386Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2329982Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2335413Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2336426Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2337270Z #22 499.9 ptxas info : Compile time = 21.377 ms 2025-09-07T06:33:55.2342347Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2351922Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2356797Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2357522Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2358140Z #22 499.9 ptxas info : Compile time = 0.985 ms 2025-09-07T06:33:55.2363350Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2373010Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2378252Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2379245Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2380100Z #22 499.9 ptxas info : Compile time = 0.603 ms 2025-09-07T06:33:55.2385752Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2396249Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2401842Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2402870Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2403678Z #22 499.9 ptxas info : Compile time = 0.554 ms 2025-09-07T06:33:55.2409319Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2419730Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2425366Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2426390Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2427466Z #22 499.9 ptxas info : Compile time = 0.525 ms 2025-09-07T06:33:55.2432762Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2442256Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2447464Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2448486Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2449690Z #22 499.9 ptxas info : Compile time = 0.526 ms 2025-09-07T06:33:55.2455392Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2465962Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2471609Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2472560Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2473407Z #22 499.9 ptxas info : Compile time = 0.538 ms 2025-09-07T06:33:55.2478992Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2489231Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2495043Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2496047Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2496892Z #22 499.9 ptxas info : Compile time = 0.528 ms 2025-09-07T06:33:55.2502429Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2512648Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2518107Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2519084Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2519928Z #22 499.9 ptxas info : Compile time = 0.519 ms 2025-09-07T06:33:55.2525578Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2536020Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2541823Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2542861Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2543698Z #22 499.9 ptxas info : Compile time = 0.503 ms 2025-09-07T06:33:55.2649925Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:55.2659367Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2664080Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2664919Z #22 499.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:55.2665577Z #22 499.9 ptxas info : Compile time = 0.493 ms 2025-09-07T06:33:55.2666223Z #22 499.9 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:33:55.2670651Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2679193Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2684116Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2685276Z #22 499.9 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:33:55.2686247Z #22 499.9 ptxas info : Compile time = 936.231 ms 2025-09-07T06:33:55.2691661Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2701158Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2706397Z #22 499.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:55.2707564Z #22 499.9 ptxas info : Used 250 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:33:55.2708923Z #22 499.9 ptxas info : Compile time = 1708.505 ms 2025-09-07T06:33:55.2713747Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2722709Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2727601Z #22 499.9 96 bytes stack frame, 124 bytes spill stores, 204 bytes spill loads 2025-09-07T06:33:55.2728895Z #22 499.9 ptxas info : Used 255 registers, used 6 barriers, 96 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:33:55.2730096Z #22 499.9 ptxas info : Compile time = 1988.220 ms 2025-09-07T06:33:55.2735178Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2744640Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2750186Z #22 499.9 144 bytes stack frame, 156 bytes spill stores, 180 bytes spill loads 2025-09-07T06:33:55.2751567Z #22 499.9 ptxas info : Used 255 registers, used 2 barriers, 144 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:33:55.2752831Z #22 499.9 ptxas info : Compile time = 1176.524 ms 2025-09-07T06:33:55.2758367Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2768185Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2773383Z #22 499.9 56 bytes stack frame, 72 bytes spill stores, 76 bytes spill loads 2025-09-07T06:33:55.2774483Z #22 499.9 ptxas info : Used 255 registers, used 6 barriers, 56 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:55.2775565Z #22 499.9 ptxas info : Compile time = 1551.437 ms 2025-09-07T06:33:55.2780846Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2790673Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2795619Z #22 499.9 48 bytes stack frame, 60 bytes spill stores, 64 bytes spill loads 2025-09-07T06:33:55.2796774Z #22 499.9 ptxas info : Used 255 registers, used 6 barriers, 48 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:55.2797791Z #22 499.9 ptxas info : Compile time = 2714.987 ms 2025-09-07T06:33:55.2802693Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2811328Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2816187Z #22 499.9 112 bytes stack frame, 160 bytes spill stores, 228 bytes spill loads 2025-09-07T06:33:55.2817599Z #22 499.9 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:33:55.2818836Z #22 499.9 ptxas info : Compile time = 2391.170 ms 2025-09-07T06:33:55.2824150Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2831883Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2835938Z #22 499.9 112 bytes stack frame, 220 bytes spill stores, 316 bytes spill loads 2025-09-07T06:33:55.2837017Z #22 499.9 ptxas info : Used 255 registers, used 6 barriers, 112 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:55.2838085Z #22 499.9 ptxas info : Compile time = 2874.988 ms 2025-09-07T06:33:55.2843770Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2852728Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2858351Z #22 499.9 104 bytes stack frame, 220 bytes spill stores, 368 bytes spill loads 2025-09-07T06:33:55.2859817Z #22 499.9 ptxas info : Used 255 registers, used 6 barriers, 104 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:55.2861093Z #22 499.9 ptxas info : Compile time = 4918.859 ms 2025-09-07T06:33:55.2866583Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2876969Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2882549Z #22 499.9 144 bytes stack frame, 232 bytes spill stores, 448 bytes spill loads 2025-09-07T06:33:55.2884899Z #22 499.9 ptxas info : Used 255 registers, used 6 barriers, 144 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:33:55.2886176Z #22 499.9 ptxas info : Compile time = 2022.814 ms 2025-09-07T06:33:55.2892026Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2902517Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2908299Z #22 499.9 144 bytes stack frame, 248 bytes spill stores, 440 bytes spill loads 2025-09-07T06:33:55.2909773Z #22 499.9 ptxas info : Used 255 registers, used 6 barriers, 144 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:55.2911056Z #22 499.9 ptxas info : Compile time = 1494.993 ms 2025-09-07T06:33:55.2917149Z #22 499.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:55.2927603Z #22 499.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:55.2933304Z #22 499.9 136 bytes stack frame, 240 bytes spill stores, 320 bytes spill loads 2025-09-07T06:33:55.2934762Z #22 499.9 ptxas info : Used 255 registers, used 6 barriers, 136 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:33:55.2935997Z #22 499.9 ptxas info : Compile time = 2687.345 ms 2025-09-07T06:33:57.5831217Z #22 502.4 [34/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:57.7397115Z #22 502.4 ptxas info : 11 bytes gmem 2025-09-07T06:33:57.7401536Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7410170Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7414715Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7415615Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7416354Z #22 502.4 ptxas info : Compile time = 1.969 ms 2025-09-07T06:33:57.7420657Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7428685Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7433124Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7434050Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7434764Z #22 502.4 ptxas info : Compile time = 21.136 ms 2025-09-07T06:33:57.7439408Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7447392Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7452166Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7453072Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7453824Z #22 502.4 ptxas info : Compile time = 0.789 ms 2025-09-07T06:33:57.7458172Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7466129Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7470182Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7470973Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7471952Z #22 502.4 ptxas info : Compile time = 0.619 ms 2025-09-07T06:33:57.7475889Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7482789Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7486812Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7487651Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7488388Z #22 502.4 ptxas info : Compile time = 0.587 ms 2025-09-07T06:33:57.7493117Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7501886Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7506872Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7507817Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7508610Z #22 502.4 ptxas info : Compile time = 0.616 ms 2025-09-07T06:33:57.7513162Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7521122Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7525029Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7525837Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7526479Z #22 502.4 ptxas info : Compile time = 0.570 ms 2025-09-07T06:33:57.7530552Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7537739Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7541620Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7542400Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7543065Z #22 502.4 ptxas info : Compile time = 0.546 ms 2025-09-07T06:33:57.7546835Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7553871Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7557646Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7558728Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7559393Z #22 502.4 ptxas info : Compile time = 0.560 ms 2025-09-07T06:33:57.7563161Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7569930Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7573755Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7574546Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7575199Z #22 502.4 ptxas info : Compile time = 0.509 ms 2025-09-07T06:33:57.7577119Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7580319Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:57.7582250Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7583061Z #22 502.4 ptxas info : Used 56 registers, used 0 barriers 2025-09-07T06:33:57.7583740Z #22 502.4 ptxas info : Compile time = 49.472 ms 2025-09-07T06:33:57.7587886Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7594768Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7598590Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7599376Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7600032Z #22 502.4 ptxas info : Compile time = 1.010 ms 2025-09-07T06:33:57.7603449Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7609452Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:57.7613267Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7614063Z #22 502.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:33:57.7614733Z #22 502.4 ptxas info : Compile time = 63.999 ms 2025-09-07T06:33:57.7618569Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7625420Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7629296Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7630096Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7630750Z #22 502.4 ptxas info : Compile time = 1.033 ms 2025-09-07T06:33:57.7632675Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7635819Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:57.7637829Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7638651Z #22 502.4 ptxas info : Used 60 registers, used 0 barriers 2025-09-07T06:33:57.7639344Z #22 502.4 ptxas info : Compile time = 50.170 ms 2025-09-07T06:33:57.7644184Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7651599Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7655548Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7656338Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7656982Z #22 502.4 ptxas info : Compile time = 0.883 ms 2025-09-07T06:33:57.7660805Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7667641Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7671900Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7672703Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7673356Z #22 502.4 ptxas info : Compile time = 0.698 ms 2025-09-07T06:33:57.7677062Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7683835Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7687655Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7688444Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7689123Z #22 502.4 ptxas info : Compile time = 0.671 ms 2025-09-07T06:33:57.7693123Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7700284Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7704029Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7704817Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7705457Z #22 502.4 ptxas info : Compile time = 0.588 ms 2025-09-07T06:33:57.7709282Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7716285Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7720206Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7721057Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7721727Z #22 502.4 ptxas info : Compile time = 0.544 ms 2025-09-07T06:33:57.7725795Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7733375Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7737301Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7738099Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7738881Z #22 502.4 ptxas info : Compile time = 0.516 ms 2025-09-07T06:33:57.7742688Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7750519Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7754995Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7755909Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7756966Z #22 502.4 ptxas info : Compile time = 0.553 ms 2025-09-07T06:33:57.7761427Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7769588Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7774231Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7775142Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7775890Z #22 502.4 ptxas info : Compile time = 0.523 ms 2025-09-07T06:33:57.7779796Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7786462Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7790510Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7791279Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7791924Z #22 502.4 ptxas info : Compile time = 0.522 ms 2025-09-07T06:33:57.7795475Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7802203Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7806691Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7807633Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7808397Z #22 502.4 ptxas info : Compile time = 20.918 ms 2025-09-07T06:33:57.7810685Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7814636Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:57.7816942Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7818175Z #22 502.4 ptxas info : Used 106 registers, used 0 barriers 2025-09-07T06:33:57.7818940Z #22 502.4 ptxas info : Compile time = 94.897 ms 2025-09-07T06:33:57.7823435Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7831693Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7836361Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7837297Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7838111Z #22 502.4 ptxas info : Compile time = 0.741 ms 2025-09-07T06:33:57.7842291Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7850401Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:57.7853959Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7854729Z #22 502.4 ptxas info : Used 86 registers, used 1 barriers 2025-09-07T06:33:57.7855388Z #22 502.4 ptxas info : Compile time = 81.701 ms 2025-09-07T06:33:57.7859131Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7866030Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7869894Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7870727Z #22 502.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:33:57.7871402Z #22 502.4 ptxas info : Compile time = 21.099 ms 2025-09-07T06:33:57.7873412Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:57.7877388Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:57.7879387Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.7880236Z #22 502.4 ptxas info : Used 96 registers, used 0 barriers 2025-09-07T06:33:57.7880922Z #22 502.4 ptxas info : Compile time = 122.116 ms 2025-09-07T06:33:57.7881627Z #22 502.4 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:33:57.7886031Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.7893966Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7898013Z #22 502.4 184 bytes stack frame, 188 bytes spill stores, 232 bytes spill loads 2025-09-07T06:33:57.7899152Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 184 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.7900273Z #22 502.4 ptxas info : Compile time = 474.209 ms 2025-09-07T06:33:57.7904458Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.7911896Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7915911Z #22 502.4 176 bytes stack frame, 184 bytes spill stores, 224 bytes spill loads 2025-09-07T06:33:57.7917035Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.7918059Z #22 502.4 ptxas info : Compile time = 437.443 ms 2025-09-07T06:33:57.7922025Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.7929303Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7933577Z #22 502.4 176 bytes stack frame, 188 bytes spill stores, 200 bytes spill loads 2025-09-07T06:33:57.7934939Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.7935939Z #22 502.4 ptxas info : Compile time = 546.955 ms 2025-09-07T06:33:57.7939974Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.7947278Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7952102Z #22 502.4 176 bytes stack frame, 188 bytes spill stores, 196 bytes spill loads 2025-09-07T06:33:57.7953282Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.7954286Z #22 502.4 ptxas info : Compile time = 445.706 ms 2025-09-07T06:33:57.7958992Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.7967386Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7971827Z #22 502.4 208 bytes stack frame, 220 bytes spill stores, 272 bytes spill loads 2025-09-07T06:33:57.7973044Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.7974116Z #22 502.4 ptxas info : Compile time = 567.640 ms 2025-09-07T06:33:57.7978267Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.7985796Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.7989949Z #22 502.4 208 bytes stack frame, 220 bytes spill stores, 264 bytes spill loads 2025-09-07T06:33:57.7991173Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.7992250Z #22 502.4 ptxas info : Compile time = 495.645 ms 2025-09-07T06:33:57.7996822Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8004351Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8008589Z #22 502.4 216 bytes stack frame, 216 bytes spill stores, 232 bytes spill loads 2025-09-07T06:33:57.8009807Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8010850Z #22 502.4 ptxas info : Compile time = 590.916 ms 2025-09-07T06:33:57.8015094Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8022675Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8027301Z #22 502.4 216 bytes stack frame, 216 bytes spill stores, 228 bytes spill loads 2025-09-07T06:33:57.8028619Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8029639Z #22 502.4 ptxas info : Compile time = 507.327 ms 2025-09-07T06:33:57.8033810Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8041327Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8045582Z #22 502.4 216 bytes stack frame, 224 bytes spill stores, 264 bytes spill loads 2025-09-07T06:33:57.8046887Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8047952Z #22 502.4 ptxas info : Compile time = 548.004 ms 2025-09-07T06:33:57.8052328Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8060387Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8064649Z #22 502.4 216 bytes stack frame, 224 bytes spill stores, 256 bytes spill loads 2025-09-07T06:33:57.8065861Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8067006Z #22 502.4 ptxas info : Compile time = 461.020 ms 2025-09-07T06:33:57.8069077Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8072460Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:57.8074693Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.8075729Z #22 502.4 ptxas info : Used 60 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:57.8076623Z #22 502.4 ptxas info : Compile time = 24.039 ms 2025-09-07T06:33:57.8081007Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8089093Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8093578Z #22 502.4 208 bytes stack frame, 208 bytes spill stores, 220 bytes spill loads 2025-09-07T06:33:57.8094929Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8096107Z #22 502.4 ptxas info : Compile time = 559.537 ms 2025-09-07T06:33:57.8099983Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8106766Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:57.8110368Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.8111289Z #22 502.4 ptxas info : Used 51 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:33:57.8112101Z #22 502.4 ptxas info : Compile time = 17.440 ms 2025-09-07T06:33:57.8116676Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8124433Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8128743Z #22 502.4 200 bytes stack frame, 200 bytes spill stores, 208 bytes spill loads 2025-09-07T06:33:57.8130013Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 200 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8131350Z #22 502.4 ptxas info : Compile time = 464.207 ms 2025-09-07T06:33:57.8133539Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8136950Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:57.8139162Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.8140256Z #22 502.4 ptxas info : Used 63 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:57.8141201Z #22 502.4 ptxas info : Compile time = 25.783 ms 2025-09-07T06:33:57.8145263Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8152932Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8157278Z #22 502.4 24 bytes stack frame, 20 bytes spill stores, 20 bytes spill loads 2025-09-07T06:33:57.8158573Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8159676Z #22 502.4 ptxas info : Compile time = 655.539 ms 2025-09-07T06:33:57.8163969Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8171781Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8175937Z #22 502.4 56 bytes stack frame, 52 bytes spill stores, 76 bytes spill loads 2025-09-07T06:33:57.8177500Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8178607Z #22 502.4 ptxas info : Compile time = 593.054 ms 2025-09-07T06:33:57.8182544Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8190195Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8194590Z #22 502.4 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:33:57.8195745Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8196868Z #22 502.4 ptxas info : Compile time = 725.232 ms 2025-09-07T06:33:57.8201080Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8208440Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8212735Z #22 502.4 56 bytes stack frame, 56 bytes spill stores, 56 bytes spill loads 2025-09-07T06:33:57.8213954Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8214960Z #22 502.4 ptxas info : Compile time = 639.442 ms 2025-09-07T06:33:57.8219207Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8228020Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8233368Z #22 502.4 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:33:57.8234873Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8236236Z #22 502.4 ptxas info : Compile time = 769.359 ms 2025-09-07T06:33:57.8241793Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8250407Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8254786Z #22 502.4 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:33:57.8256073Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8257214Z #22 502.4 ptxas info : Compile time = 666.725 ms 2025-09-07T06:33:57.8262132Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8271728Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8277441Z #22 502.4 64 bytes stack frame, 68 bytes spill stores, 72 bytes spill loads 2025-09-07T06:33:57.8278953Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8280263Z #22 502.4 ptxas info : Compile time = 763.690 ms 2025-09-07T06:33:57.8284460Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8292412Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8297522Z #22 502.4 64 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:33:57.8299002Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8300316Z #22 502.4 ptxas info : Compile time = 670.931 ms 2025-09-07T06:33:57.8305389Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8314975Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8318798Z #22 502.4 72 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:33:57.8319887Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8320977Z #22 502.4 ptxas info : Compile time = 706.444 ms 2025-09-07T06:33:57.8324809Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8331658Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8335787Z #22 502.4 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads 2025-09-07T06:33:57.8337057Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8338241Z #22 502.4 ptxas info : Compile time = 592.882 ms 2025-09-07T06:33:57.8340411Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8343857Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:57.8346039Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.8347121Z #22 502.4 ptxas info : Used 105 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:57.8348078Z #22 502.4 ptxas info : Compile time = 46.241 ms 2025-09-07T06:33:57.8352574Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8360509Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8365634Z #22 502.4 56 bytes stack frame, 60 bytes spill stores, 64 bytes spill loads 2025-09-07T06:33:57.8367134Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8368492Z #22 502.4 ptxas info : Compile time = 709.562 ms 2025-09-07T06:33:57.8373798Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8382502Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:33:57.8386918Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.8387825Z #22 502.4 ptxas info : Used 86 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:33:57.8388709Z #22 502.4 ptxas info : Compile time = 33.363 ms 2025-09-07T06:33:57.8392990Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8401893Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:57.8407556Z #22 502.4 56 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads 2025-09-07T06:33:57.8409065Z #22 502.4 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:33:57.8410400Z #22 502.4 ptxas info : Compile time = 598.749 ms 2025-09-07T06:33:57.8413235Z #22 502.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:57.8417666Z #22 502.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:33:57.8420054Z #22 502.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:57.8420938Z #22 502.4 ptxas info : Used 103 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:33:57.8421744Z #22 502.4 ptxas info : Compile time = 56.983 ms 2025-09-07T06:34:10.0523304Z #22 514.8 [35/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:10.2058607Z #22 514.8 ptxas info : 11 bytes gmem 2025-09-07T06:34:10.2063161Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2070730Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2075800Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2076744Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2077536Z #22 514.8 ptxas info : Compile time = 2.183 ms 2025-09-07T06:34:10.2082026Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2090276Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2094783Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2095639Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2096363Z #22 514.8 ptxas info : Compile time = 21.276 ms 2025-09-07T06:34:10.2100647Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2108741Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2113170Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2114056Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2114774Z #22 514.8 ptxas info : Compile time = 1.108 ms 2025-09-07T06:34:10.2119141Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2126653Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2130474Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2131510Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2132423Z #22 514.8 ptxas info : Compile time = 0.649 ms 2025-09-07T06:34:10.2137236Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2145893Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2151043Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2151965Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2152703Z #22 514.8 ptxas info : Compile time = 0.572 ms 2025-09-07T06:34:10.2157450Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2166606Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2171871Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2172793Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2173548Z #22 514.8 ptxas info : Compile time = 0.541 ms 2025-09-07T06:34:10.2178011Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2185156Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2189874Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2190747Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2191517Z #22 514.8 ptxas info : Compile time = 0.510 ms 2025-09-07T06:34:10.2196590Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2205891Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2210693Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2211794Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2212528Z #22 514.8 ptxas info : Compile time = 0.557 ms 2025-09-07T06:34:10.2217277Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2225833Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2230626Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2231796Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2232571Z #22 514.8 ptxas info : Compile time = 0.515 ms 2025-09-07T06:34:10.2237433Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2245839Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2250940Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2252136Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2252789Z #22 514.8 ptxas info : Compile time = 0.530 ms 2025-09-07T06:34:10.2260010Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2270452Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2275434Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2276314Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2277080Z #22 514.8 ptxas info : Compile time = 0.508 ms 2025-09-07T06:34:10.2281888Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:10.2290437Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2295383Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2296279Z #22 514.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:10.2297007Z #22 514.8 ptxas info : Compile time = 0.509 ms 2025-09-07T06:34:10.2298082Z #22 514.8 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:34:10.2302323Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2309439Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2313157Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2314148Z #22 514.8 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:10.2314995Z #22 514.8 ptxas info : Compile time = 1022.799 ms 2025-09-07T06:34:10.2319300Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2327133Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2331912Z #22 514.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:10.2332907Z #22 514.8 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:10.2333762Z #22 514.8 ptxas info : Compile time = 1750.960 ms 2025-09-07T06:34:10.2338159Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2346199Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2351011Z #22 514.8 216 bytes stack frame, 232 bytes spill stores, 332 bytes spill loads 2025-09-07T06:34:10.2352279Z #22 514.8 ptxas info : Used 255 registers, used 6 barriers, 216 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:34:10.2353384Z #22 514.8 ptxas info : Compile time = 2280.269 ms 2025-09-07T06:34:10.2357673Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2364459Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2368185Z #22 514.8 40 bytes stack frame, 52 bytes spill stores, 68 bytes spill loads 2025-09-07T06:34:10.2369253Z #22 514.8 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:10.2370204Z #22 514.8 ptxas info : Compile time = 1264.318 ms 2025-09-07T06:34:10.2375035Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2383990Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2389159Z #22 514.8 176 bytes stack frame, 192 bytes spill stores, 324 bytes spill loads 2025-09-07T06:34:10.2390477Z #22 514.8 ptxas info : Used 255 registers, used 6 barriers, 176 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:34:10.2391586Z #22 514.8 ptxas info : Compile time = 1597.593 ms 2025-09-07T06:34:10.2396559Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2405707Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2410452Z #22 514.8 248 bytes stack frame, 324 bytes spill stores, 528 bytes spill loads 2025-09-07T06:34:10.2411867Z #22 514.8 ptxas info : Used 255 registers, used 6 barriers, 248 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:34:10.2413002Z #22 514.8 ptxas info : Compile time = 3194.796 ms 2025-09-07T06:34:10.2417205Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2424650Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2429556Z #22 514.8 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:34:10.2430786Z #22 514.8 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:10.2431883Z #22 514.8 ptxas info : Compile time = 2597.118 ms 2025-09-07T06:34:10.2436843Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2445998Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2451255Z #22 514.8 48 bytes stack frame, 64 bytes spill stores, 68 bytes spill loads 2025-09-07T06:34:10.2452580Z #22 514.8 ptxas info : Used 255 registers, used 6 barriers, 48 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:34:10.2453705Z #22 514.8 ptxas info : Compile time = 3366.547 ms 2025-09-07T06:34:10.2458676Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2467496Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2472002Z #22 514.8 112 bytes stack frame, 188 bytes spill stores, 280 bytes spill loads 2025-09-07T06:34:10.2473252Z #22 514.8 ptxas info : Used 255 registers, used 6 barriers, 112 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:34:10.2474316Z #22 514.8 ptxas info : Compile time = 5730.191 ms 2025-09-07T06:34:10.2479213Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2487729Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2493060Z #22 514.8 272 bytes stack frame, 572 bytes spill stores, 700 bytes spill loads 2025-09-07T06:34:10.2494271Z #22 514.8 ptxas info : Used 255 registers, used 6 barriers, 272 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:34:10.2495379Z #22 514.8 ptxas info : Compile time = 2680.225 ms 2025-09-07T06:34:10.2499902Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2527638Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2533175Z #22 514.8 256 bytes stack frame, 360 bytes spill stores, 636 bytes spill loads 2025-09-07T06:34:10.2534432Z #22 514.8 ptxas info : Used 255 registers, used 6 barriers, 256 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:34:10.2535522Z #22 514.8 ptxas info : Compile time = 2739.016 ms 2025-09-07T06:34:10.2540230Z #22 514.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:10.2548384Z #22 514.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:10.2552825Z #22 514.8 288 bytes stack frame, 412 bytes spill stores, 748 bytes spill loads 2025-09-07T06:34:10.2553897Z #22 514.8 ptxas info : Used 255 registers, used 6 barriers, 288 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:34:10.2554825Z #22 514.8 ptxas info : Compile time = 5147.604 ms 2025-09-07T06:34:16.1355151Z #22 520.9 [36/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:16.2898386Z #22 520.9 ptxas info : 11 bytes gmem 2025-09-07T06:34:16.2984281Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.2992096Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.2997196Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.2998332Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.2999267Z #22 520.9 ptxas info : Compile time = 1.946 ms 2025-09-07T06:34:16.3004804Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3015033Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3019364Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3020276Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3021373Z #22 520.9 ptxas info : Compile time = 0.913 ms 2025-09-07T06:34:16.3026290Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3036071Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3041620Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3042765Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3043703Z #22 520.9 ptxas info : Compile time = 0.581 ms 2025-09-07T06:34:16.3047729Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3058543Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3062965Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3063778Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3064493Z #22 520.9 ptxas info : Compile time = 20.594 ms 2025-09-07T06:34:16.3068452Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3075153Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3079129Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3079951Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3080615Z #22 520.9 ptxas info : Compile time = 0.938 ms 2025-09-07T06:34:16.3084606Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3092414Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3096231Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3097036Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3097696Z #22 520.9 ptxas info : Compile time = 0.649 ms 2025-09-07T06:34:16.3101561Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3108600Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3112856Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3113724Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3114686Z #22 520.9 ptxas info : Compile time = 0.597 ms 2025-09-07T06:34:16.3118998Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3126581Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3130733Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3131786Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3132449Z #22 520.9 ptxas info : Compile time = 0.554 ms 2025-09-07T06:34:16.3136134Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3142973Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3147280Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3148194Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3149615Z #22 520.9 ptxas info : Compile time = 0.535 ms 2025-09-07T06:34:16.3153927Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3161126Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3164660Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3165431Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3166079Z #22 520.9 ptxas info : Compile time = 0.616 ms 2025-09-07T06:34:16.3170468Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3178543Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3183334Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3184208Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3184937Z #22 520.9 ptxas info : Compile time = 0.533 ms 2025-09-07T06:34:16.3189396Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3197703Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3202268Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3203195Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3203979Z #22 520.9 ptxas info : Compile time = 0.518 ms 2025-09-07T06:34:16.3208608Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3216302Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3219918Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3220678Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3221291Z #22 520.9 ptxas info : Compile time = 0.570 ms 2025-09-07T06:34:16.3224896Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3232724Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3237422Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3238544Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3239387Z #22 520.9 ptxas info : Compile time = 0.516 ms 2025-09-07T06:34:16.3243996Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3253740Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3258524Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3259451Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3260244Z #22 520.9 ptxas info : Compile time = 0.531 ms 2025-09-07T06:34:16.3264757Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3273005Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3277427Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3278291Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3279009Z #22 520.9 ptxas info : Compile time = 0.606 ms 2025-09-07T06:34:16.3283141Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3290842Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3295291Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3296157Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3296919Z #22 520.9 ptxas info : Compile time = 0.613 ms 2025-09-07T06:34:16.3300990Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3308058Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3311671Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3312429Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3313043Z #22 520.9 ptxas info : Compile time = 0.612 ms 2025-09-07T06:34:16.3316707Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3323347Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3327087Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3327893Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3328565Z #22 520.9 ptxas info : Compile time = 0.582 ms 2025-09-07T06:34:16.3333104Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3340789Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3345004Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3345858Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3346589Z #22 520.9 ptxas info : Compile time = 0.554 ms 2025-09-07T06:34:16.3356494Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3363962Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3368429Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3369307Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3370014Z #22 520.9 ptxas info : Compile time = 0.550 ms 2025-09-07T06:34:16.3374242Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3381451Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3385525Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3386407Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3387118Z #22 520.9 ptxas info : Compile time = 0.533 ms 2025-09-07T06:34:16.3389188Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3392585Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:16.3394731Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3395606Z #22 520.9 ptxas info : Used 76 registers, used 0 barriers 2025-09-07T06:34:16.3396345Z #22 520.9 ptxas info : Compile time = 89.873 ms 2025-09-07T06:34:16.3400815Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3408426Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3412804Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3413613Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3414336Z #22 520.9 ptxas info : Compile time = 1.005 ms 2025-09-07T06:34:16.3418054Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3424809Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:16.3428897Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3429775Z #22 520.9 ptxas info : Used 80 registers, used 1 barriers 2025-09-07T06:34:16.3430496Z #22 520.9 ptxas info : Compile time = 94.481 ms 2025-09-07T06:34:16.3434324Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3441201Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:16.3445009Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3445850Z #22 520.9 ptxas info : Used 70 registers, used 1 barriers 2025-09-07T06:34:16.3446545Z #22 520.9 ptxas info : Compile time = 30.197 ms 2025-09-07T06:34:16.3450918Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3460527Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3464633Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3465509Z #22 520.9 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:16.3466199Z #22 520.9 ptxas info : Compile time = 0.937 ms 2025-09-07T06:34:16.3654004Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:16.3659935Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:16.3663250Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3665201Z #22 520.9 ptxas info : Used 74 registers, used 0 barriers 2025-09-07T06:34:16.3666968Z #22 520.9 ptxas info : Compile time = 42.127 ms 2025-09-07T06:34:16.3668694Z #22 520.9 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:34:16.3673859Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3683984Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3689354Z #22 520.9 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:34:16.3691709Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.3693857Z #22 520.9 ptxas info : Compile time = 547.513 ms 2025-09-07T06:34:16.3698979Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3707542Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3715617Z #22 520.9 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:34:16.3716801Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.3717877Z #22 520.9 ptxas info : Compile time = 527.129 ms 2025-09-07T06:34:16.3722442Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3729852Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3734118Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3735099Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:16.3735901Z #22 520.9 ptxas info : Compile time = 604.178 ms 2025-09-07T06:34:16.3740062Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3747506Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3753487Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3754446Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:16.3755328Z #22 520.9 ptxas info : Compile time = 557.409 ms 2025-09-07T06:34:16.3759381Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3766993Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3771442Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3772440Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:16.3773330Z #22 520.9 ptxas info : Compile time = 707.634 ms 2025-09-07T06:34:16.3777352Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3785086Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3789156Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3790129Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:16.3790950Z #22 520.9 ptxas info : Compile time = 628.616 ms 2025-09-07T06:34:16.3795063Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3802554Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3806721Z #22 520.9 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:34:16.3807925Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.3808969Z #22 520.9 ptxas info : Compile time = 740.111 ms 2025-09-07T06:34:16.3813335Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3821207Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3825407Z #22 520.9 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:34:16.3826625Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.3827685Z #22 520.9 ptxas info : Compile time = 671.489 ms 2025-09-07T06:34:16.3831748Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3839146Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3843193Z #22 520.9 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:34:16.3844363Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.3845423Z #22 520.9 ptxas info : Compile time = 580.116 ms 2025-09-07T06:34:16.3849960Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3857224Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3861221Z #22 520.9 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:34:16.3862399Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.3863468Z #22 520.9 ptxas info : Compile time = 533.468 ms 2025-09-07T06:34:16.3867531Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3874524Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3878884Z #22 520.9 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:34:16.3880076Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.3881087Z #22 520.9 ptxas info : Compile time = 609.141 ms 2025-09-07T06:34:16.3885092Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3892551Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3896527Z #22 520.9 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:34:16.3897667Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.3898698Z #22 520.9 ptxas info : Compile time = 568.475 ms 2025-09-07T06:34:16.3903124Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3910658Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3914807Z #22 520.9 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads 2025-09-07T06:34:16.3915971Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.3916994Z #22 520.9 ptxas info : Compile time = 892.467 ms 2025-09-07T06:34:16.3921073Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3928492Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3932932Z #22 520.9 24 bytes stack frame, 24 bytes spill stores, 24 bytes spill loads 2025-09-07T06:34:16.3934292Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.3935338Z #22 520.9 ptxas info : Compile time = 723.918 ms 2025-09-07T06:34:16.3939553Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3946987Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3951407Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3952347Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:16.3953188Z #22 520.9 ptxas info : Compile time = 930.545 ms 2025-09-07T06:34:16.3957291Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3965050Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3969198Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3970141Z #22 520.9 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:16.3970950Z #22 520.9 ptxas info : Compile time = 627.566 ms 2025-09-07T06:34:16.3975213Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.3982586Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.3986807Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.3987739Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:16.3988571Z #22 520.9 ptxas info : Compile time = 949.442 ms 2025-09-07T06:34:16.3992784Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.4001308Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.4005801Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.4006882Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:16.4007844Z #22 520.9 ptxas info : Compile time = 730.321 ms 2025-09-07T06:34:16.4012539Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.4020653Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.4025165Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.4026241Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:16.4027187Z #22 520.9 ptxas info : Compile time = 934.511 ms 2025-09-07T06:34:16.4031875Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.4039932Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.4044452Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.4045522Z #22 520.9 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:16.4046474Z #22 520.9 ptxas info : Compile time = 594.524 ms 2025-09-07T06:34:16.4050678Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.4057128Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.4060934Z #22 520.9 24 bytes stack frame, 24 bytes spill stores, 28 bytes spill loads 2025-09-07T06:34:16.4061942Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.4062828Z #22 520.9 ptxas info : Compile time = 905.144 ms 2025-09-07T06:34:16.4066266Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.4072579Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.4076200Z #22 520.9 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:34:16.4077236Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.4078154Z #22 520.9 ptxas info : Compile time = 713.242 ms 2025-09-07T06:34:16.4079982Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.4083372Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:16.4085224Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.4086021Z #22 520.9 ptxas info : Used 75 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:34:16.4086730Z #22 520.9 ptxas info : Compile time = 35.527 ms 2025-09-07T06:34:16.4090295Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.4097033Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.4100646Z #22 520.9 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:34:16.4101680Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:16.4102602Z #22 520.9 ptxas info : Compile time = 916.334 ms 2025-09-07T06:34:16.4105922Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.4111960Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:16.4115217Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.4116031Z #22 520.9 ptxas info : Used 80 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:34:16.4116773Z #22 520.9 ptxas info : Compile time = 44.223 ms 2025-09-07T06:34:16.4120020Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.4125970Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:16.4129349Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.4130191Z #22 520.9 ptxas info : Used 69 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:34:16.4130966Z #22 520.9 ptxas info : Compile time = 27.439 ms 2025-09-07T06:34:16.4134953Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.4141467Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:16.4145156Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.4146024Z #22 520.9 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:16.4146759Z #22 520.9 ptxas info : Compile time = 622.949 ms 2025-09-07T06:34:16.4148601Z #22 520.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:16.4152077Z #22 520.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:16.4153910Z #22 520.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:16.4154742Z #22 520.9 ptxas info : Used 78 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:34:16.4155728Z #22 520.9 ptxas info : Compile time = 40.631 ms 2025-09-07T06:34:32.4806136Z #22 537.3 [37/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:32.4828204Z #22 537.3 ptxas info : 11 bytes gmem 2025-09-07T06:34:32.4833364Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.4841379Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.4846212Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.4847188Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.4848012Z #22 537.3 ptxas info : Compile time = 2.017 ms 2025-09-07T06:34:32.4853020Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.4861835Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.4866835Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.4867805Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.4868568Z #22 537.3 ptxas info : Compile time = 0.920 ms 2025-09-07T06:34:32.4873432Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.4882148Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.4886665Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.4887549Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.4888287Z #22 537.3 ptxas info : Compile time = 0.641 ms 2025-09-07T06:34:32.4892955Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.4901122Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.4905746Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.4906721Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.4907516Z #22 537.3 ptxas info : Compile time = 0.623 ms 2025-09-07T06:34:32.4912446Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.4920758Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.4925256Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.4926188Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.4926980Z #22 537.3 ptxas info : Compile time = 0.662 ms 2025-09-07T06:34:32.4932000Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.4940231Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.4945111Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.4946137Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.4946973Z #22 537.3 ptxas info : Compile time = 0.665 ms 2025-09-07T06:34:32.4953705Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.4960425Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.4963791Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.4964477Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.4965352Z #22 537.3 ptxas info : Compile time = 0.633 ms 2025-09-07T06:34:32.4968693Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.4974975Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.4978667Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.4979448Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.4980085Z #22 537.3 ptxas info : Compile time = 0.606 ms 2025-09-07T06:34:32.4983872Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.4990212Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.4993916Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.4994754Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.4995447Z #22 537.3 ptxas info : Compile time = 0.663 ms 2025-09-07T06:34:32.4999494Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5005598Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5009129Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5009926Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5010558Z #22 537.3 ptxas info : Compile time = 0.614 ms 2025-09-07T06:34:32.5012680Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5016143Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5018339Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5019232Z #22 537.3 ptxas info : Used 60 registers, used 0 barriers 2025-09-07T06:34:32.5019975Z #22 537.3 ptxas info : Compile time = 28.659 ms 2025-09-07T06:34:32.5024307Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5032536Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5037487Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5038475Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5039315Z #22 537.3 ptxas info : Compile time = 0.855 ms 2025-09-07T06:34:32.5043625Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5053295Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5057478Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5058469Z #22 537.3 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:34:32.5059307Z #22 537.3 ptxas info : Compile time = 25.886 ms 2025-09-07T06:34:32.5063952Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5071787Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5076209Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5077151Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5077846Z #22 537.3 ptxas info : Compile time = 0.926 ms 2025-09-07T06:34:32.5080435Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5084077Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5086267Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5087151Z #22 537.3 ptxas info : Used 61 registers, used 0 barriers 2025-09-07T06:34:32.5087913Z #22 537.3 ptxas info : Compile time = 29.373 ms 2025-09-07T06:34:32.5092400Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5100895Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5105159Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5106121Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5106931Z #22 537.3 ptxas info : Compile time = 0.978 ms 2025-09-07T06:34:32.5111975Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5119939Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5123643Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5124355Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5124931Z #22 537.3 ptxas info : Compile time = 0.773 ms 2025-09-07T06:34:32.5128171Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5134186Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5137463Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5138168Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5139062Z #22 537.3 ptxas info : Compile time = 0.738 ms 2025-09-07T06:34:32.5142677Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5149871Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5153275Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5153966Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5154551Z #22 537.3 ptxas info : Compile time = 0.650 ms 2025-09-07T06:34:32.5157979Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5165236Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5169149Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5169906Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5170525Z #22 537.3 ptxas info : Compile time = 0.657 ms 2025-09-07T06:34:32.5174372Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5181518Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5185809Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5186668Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5187394Z #22 537.3 ptxas info : Compile time = 0.620 ms 2025-09-07T06:34:32.5191570Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5199881Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5204332Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5205256Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5206018Z #22 537.3 ptxas info : Compile time = 0.638 ms 2025-09-07T06:34:32.5210571Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5218851Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5223339Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5224296Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5225327Z #22 537.3 ptxas info : Compile time = 0.610 ms 2025-09-07T06:34:32.5229852Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5237620Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5242310Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5243312Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5244143Z #22 537.3 ptxas info : Compile time = 0.595 ms 2025-09-07T06:34:32.5248594Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5256721Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5260864Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5262102Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5262869Z #22 537.3 ptxas info : Compile time = 0.651 ms 2025-09-07T06:34:32.5265048Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5268676Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5270971Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5271881Z #22 537.3 ptxas info : Used 124 registers, used 0 barriers 2025-09-07T06:34:32.5272625Z #22 537.3 ptxas info : Compile time = 92.068 ms 2025-09-07T06:34:32.5276991Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5284541Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5290132Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5291010Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5291927Z #22 537.3 ptxas info : Compile time = 0.913 ms 2025-09-07T06:34:32.5295768Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5303015Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5307179Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5308128Z #22 537.3 ptxas info : Used 86 registers, used 1 barriers 2025-09-07T06:34:32.5308933Z #22 537.3 ptxas info : Compile time = 61.849 ms 2025-09-07T06:34:32.5313246Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5321540Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5325959Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5326871Z #22 537.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:32.5327615Z #22 537.3 ptxas info : Compile time = 0.937 ms 2025-09-07T06:34:32.5329965Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:32.5333931Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5336224Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5337151Z #22 537.3 ptxas info : Used 122 registers, used 0 barriers 2025-09-07T06:34:32.5337988Z #22 537.3 ptxas info : Compile time = 61.996 ms 2025-09-07T06:34:32.5338786Z #22 537.3 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:34:32.5343287Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5392994Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5397035Z #22 537.3 184 bytes stack frame, 188 bytes spill stores, 232 bytes spill loads 2025-09-07T06:34:32.5398060Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 184 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5398916Z #22 537.3 ptxas info : Compile time = 510.468 ms 2025-09-07T06:34:32.5402312Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5408525Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5412237Z #22 537.3 176 bytes stack frame, 184 bytes spill stores, 224 bytes spill loads 2025-09-07T06:34:32.5413224Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5414087Z #22 537.3 ptxas info : Compile time = 475.817 ms 2025-09-07T06:34:32.5418135Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5424354Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5427820Z #22 537.3 176 bytes stack frame, 188 bytes spill stores, 200 bytes spill loads 2025-09-07T06:34:32.5428827Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5429698Z #22 537.3 ptxas info : Compile time = 580.518 ms 2025-09-07T06:34:32.5433118Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5439286Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5442949Z #22 537.3 176 bytes stack frame, 188 bytes spill stores, 196 bytes spill loads 2025-09-07T06:34:32.5443934Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5444790Z #22 537.3 ptxas info : Compile time = 482.352 ms 2025-09-07T06:34:32.5448188Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5454854Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5458295Z #22 537.3 208 bytes stack frame, 220 bytes spill stores, 272 bytes spill loads 2025-09-07T06:34:32.5459289Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5460138Z #22 537.3 ptxas info : Compile time = 615.514 ms 2025-09-07T06:34:32.5463563Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5470090Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5473542Z #22 537.3 208 bytes stack frame, 220 bytes spill stores, 264 bytes spill loads 2025-09-07T06:34:32.5474542Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5475403Z #22 537.3 ptxas info : Compile time = 531.957 ms 2025-09-07T06:34:32.5478822Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5485037Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5488527Z #22 537.3 216 bytes stack frame, 216 bytes spill stores, 232 bytes spill loads 2025-09-07T06:34:32.5489531Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5490737Z #22 537.3 ptxas info : Compile time = 621.231 ms 2025-09-07T06:34:32.5494261Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5500421Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5503863Z #22 537.3 216 bytes stack frame, 216 bytes spill stores, 228 bytes spill loads 2025-09-07T06:34:32.5504861Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5505733Z #22 537.3 ptxas info : Compile time = 540.051 ms 2025-09-07T06:34:32.5509077Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5515078Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5518690Z #22 537.3 216 bytes stack frame, 224 bytes spill stores, 264 bytes spill loads 2025-09-07T06:34:32.5519685Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5520546Z #22 537.3 ptxas info : Compile time = 583.959 ms 2025-09-07T06:34:32.5523849Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5529800Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5533342Z #22 537.3 216 bytes stack frame, 224 bytes spill stores, 256 bytes spill loads 2025-09-07T06:34:32.5534364Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5535211Z #22 537.3 ptxas info : Compile time = 506.117 ms 2025-09-07T06:34:32.5536980Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5540068Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5541837Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5542625Z #22 537.3 ptxas info : Used 61 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:34:32.5543287Z #22 537.3 ptxas info : Compile time = 26.760 ms 2025-09-07T06:34:32.5546685Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5553206Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5556663Z #22 537.3 208 bytes stack frame, 208 bytes spill stores, 220 bytes spill loads 2025-09-07T06:34:32.5557643Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5558505Z #22 537.3 ptxas info : Compile time = 603.308 ms 2025-09-07T06:34:32.5561508Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5567277Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5570347Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5571282Z #22 537.3 ptxas info : Used 51 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:34:32.5571950Z #22 537.3 ptxas info : Compile time = 20.294 ms 2025-09-07T06:34:32.5575384Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5581575Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5585015Z #22 537.3 200 bytes stack frame, 200 bytes spill stores, 208 bytes spill loads 2025-09-07T06:34:32.5586047Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 200 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5587382Z #22 537.3 ptxas info : Compile time = 512.263 ms 2025-09-07T06:34:32.5589134Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5591980Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5593742Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5594521Z #22 537.3 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:34:32.5595207Z #22 537.3 ptxas info : Compile time = 29.031 ms 2025-09-07T06:34:32.5598549Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5604627Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5608012Z #22 537.3 24 bytes stack frame, 20 bytes spill stores, 20 bytes spill loads 2025-09-07T06:34:32.5608988Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5609881Z #22 537.3 ptxas info : Compile time = 717.053 ms 2025-09-07T06:34:32.5613673Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5619671Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5623012Z #22 537.3 56 bytes stack frame, 52 bytes spill stores, 76 bytes spill loads 2025-09-07T06:34:32.5623985Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5624867Z #22 537.3 ptxas info : Compile time = 655.716 ms 2025-09-07T06:34:32.5628206Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5634272Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5637868Z #22 537.3 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:34:32.5638828Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5639677Z #22 537.3 ptxas info : Compile time = 769.013 ms 2025-09-07T06:34:32.5643024Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5649425Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5652855Z #22 537.3 56 bytes stack frame, 56 bytes spill stores, 56 bytes spill loads 2025-09-07T06:34:32.5653816Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5654663Z #22 537.3 ptxas info : Compile time = 687.109 ms 2025-09-07T06:34:32.5658008Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5664330Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5667708Z #22 537.3 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:34:32.5668655Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5669510Z #22 537.3 ptxas info : Compile time = 854.100 ms 2025-09-07T06:34:32.5672856Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5678889Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5682234Z #22 537.3 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:34:32.5683518Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5684351Z #22 537.3 ptxas info : Compile time = 759.331 ms 2025-09-07T06:34:32.5687684Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5693858Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5697191Z #22 537.3 64 bytes stack frame, 68 bytes spill stores, 72 bytes spill loads 2025-09-07T06:34:32.5698164Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5699028Z #22 537.3 ptxas info : Compile time = 824.573 ms 2025-09-07T06:34:32.5702357Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5708549Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5711891Z #22 537.3 64 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:34:32.5712832Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5713683Z #22 537.3 ptxas info : Compile time = 706.216 ms 2025-09-07T06:34:32.5716904Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5722747Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5726028Z #22 537.3 72 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:34:32.5727004Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5727851Z #22 537.3 ptxas info : Compile time = 723.422 ms 2025-09-07T06:34:32.5731307Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5737282Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5740519Z #22 537.3 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads 2025-09-07T06:34:32.5741472Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5742336Z #22 537.3 ptxas info : Compile time = 630.084 ms 2025-09-07T06:34:32.5744077Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5746957Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5749062Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5749845Z #22 537.3 ptxas info : Used 119 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:34:32.5750526Z #22 537.3 ptxas info : Compile time = 51.618 ms 2025-09-07T06:34:32.5753853Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5760191Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5763578Z #22 537.3 56 bytes stack frame, 60 bytes spill stores, 64 bytes spill loads 2025-09-07T06:34:32.5764541Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5765375Z #22 537.3 ptxas info : Compile time = 747.792 ms 2025-09-07T06:34:32.5768472Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5774151Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5777250Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5778226Z #22 537.3 ptxas info : Used 86 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:34:32.5778992Z #22 537.3 ptxas info : Compile time = 31.400 ms 2025-09-07T06:34:32.5782329Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5788342Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:32.5791701Z #22 537.3 56 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads 2025-09-07T06:34:32.5792670Z #22 537.3 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:32.5793526Z #22 537.3 ptxas info : Compile time = 632.798 ms 2025-09-07T06:34:32.5795284Z #22 537.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:32.5798114Z #22 537.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:32.5799881Z #22 537.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:32.5800689Z #22 537.3 ptxas info : Used 121 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:34:32.5801370Z #22 537.3 ptxas info : Compile time = 57.022 ms 2025-09-07T06:34:36.5353100Z #22 541.3 [38/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:36.6950126Z #22 541.3 ptxas info : 11 bytes gmem 2025-09-07T06:34:36.6954568Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.6962277Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.6966619Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.6967504Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.6968236Z #22 541.3 ptxas info : Compile time = 2.213 ms 2025-09-07T06:34:36.6972625Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.6980330Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.6985260Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.6986162Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.6986860Z #22 541.3 ptxas info : Compile time = 1.075 ms 2025-09-07T06:34:36.6990865Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.6998676Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7003241Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7004132Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7004872Z #22 541.3 ptxas info : Compile time = 0.706 ms 2025-09-07T06:34:36.7009081Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7017069Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7021066Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7021882Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7022578Z #22 541.3 ptxas info : Compile time = 0.657 ms 2025-09-07T06:34:36.7026605Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7033807Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7037760Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7038583Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7039255Z #22 541.3 ptxas info : Compile time = 0.654 ms 2025-09-07T06:34:36.7043803Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7051926Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7056168Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7057035Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7057794Z #22 541.3 ptxas info : Compile time = 0.654 ms 2025-09-07T06:34:36.7062047Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7070051Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7074753Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7075604Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7076348Z #22 541.3 ptxas info : Compile time = 0.641 ms 2025-09-07T06:34:36.7080409Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7087679Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7091896Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7092731Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7093393Z #22 541.3 ptxas info : Compile time = 0.614 ms 2025-09-07T06:34:36.7097255Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7104454Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7108543Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7109428Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7110178Z #22 541.3 ptxas info : Compile time = 0.692 ms 2025-09-07T06:34:36.7114570Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7122115Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7126133Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7126974Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7127660Z #22 541.3 ptxas info : Compile time = 0.593 ms 2025-09-07T06:34:36.7131849Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7139352Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7143280Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7144105Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7144777Z #22 541.3 ptxas info : Compile time = 0.611 ms 2025-09-07T06:34:36.7149134Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7156915Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7161252Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7162149Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7162874Z #22 541.3 ptxas info : Compile time = 0.608 ms 2025-09-07T06:34:36.7167763Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7175743Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7180124Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7180993Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7181709Z #22 541.3 ptxas info : Compile time = 0.651 ms 2025-09-07T06:34:36.7186040Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7193945Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7198824Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7199681Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7200395Z #22 541.3 ptxas info : Compile time = 0.615 ms 2025-09-07T06:34:36.7204570Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7212419Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7216852Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7217731Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7218441Z #22 541.3 ptxas info : Compile time = 0.517 ms 2025-09-07T06:34:36.7222816Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7231059Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7235244Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7236062Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7236746Z #22 541.3 ptxas info : Compile time = 0.513 ms 2025-09-07T06:34:36.7240948Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7249516Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7254019Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7255087Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7255921Z #22 541.3 ptxas info : Compile time = 0.508 ms 2025-09-07T06:34:36.7261231Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7270356Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7274922Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7275916Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7276749Z #22 541.3 ptxas info : Compile time = 0.529 ms 2025-09-07T06:34:36.7281390Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7289727Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7294865Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7295842Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7297073Z #22 541.3 ptxas info : Compile time = 0.520 ms 2025-09-07T06:34:36.7301635Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7309847Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7314785Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7315799Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7316578Z #22 541.3 ptxas info : Compile time = 0.516 ms 2025-09-07T06:34:36.7321216Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7330143Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7334944Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7335943Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7336779Z #22 541.3 ptxas info : Compile time = 0.511 ms 2025-09-07T06:34:36.7341461Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7352720Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7357779Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7358871Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7359739Z #22 541.3 ptxas info : Compile time = 0.512 ms 2025-09-07T06:34:36.7362085Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7366297Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:36.7368722Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7369719Z #22 541.3 ptxas info : Used 90 registers, used 0 barriers 2025-09-07T06:34:36.7370488Z #22 541.3 ptxas info : Compile time = 34.414 ms 2025-09-07T06:34:36.7375572Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7383742Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7389142Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7390062Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7390939Z #22 541.3 ptxas info : Compile time = 0.898 ms 2025-09-07T06:34:36.7395219Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7403565Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:36.7407921Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7408714Z #22 541.3 ptxas info : Used 80 registers, used 1 barriers 2025-09-07T06:34:36.7409403Z #22 541.3 ptxas info : Compile time = 51.572 ms 2025-09-07T06:34:36.7413278Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7420550Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:36.7424854Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7425772Z #22 541.3 ptxas info : Used 70 registers, used 1 barriers 2025-09-07T06:34:36.7426494Z #22 541.3 ptxas info : Compile time = 59.902 ms 2025-09-07T06:34:36.7431096Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7438580Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7442643Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7443467Z #22 541.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:36.7444151Z #22 541.3 ptxas info : Compile time = 0.910 ms 2025-09-07T06:34:36.7446200Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:36.7449961Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:36.7452129Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7452939Z #22 541.3 ptxas info : Used 76 registers, used 0 barriers 2025-09-07T06:34:36.7453625Z #22 541.3 ptxas info : Compile time = 77.033 ms 2025-09-07T06:34:36.7454601Z #22 541.3 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:34:36.7458672Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7466117Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7470445Z #22 541.3 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:34:36.7471633Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.7472743Z #22 541.3 ptxas info : Compile time = 583.793 ms 2025-09-07T06:34:36.7476931Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7484113Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7488260Z #22 541.3 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:34:36.7489865Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.7490926Z #22 541.3 ptxas info : Compile time = 549.562 ms 2025-09-07T06:34:36.7495409Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7503029Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7507359Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7508352Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:36.7509187Z #22 541.3 ptxas info : Compile time = 607.945 ms 2025-09-07T06:34:36.7513314Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7521404Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7525550Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7526501Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:36.7527270Z #22 541.3 ptxas info : Compile time = 559.869 ms 2025-09-07T06:34:36.7531807Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7538979Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7542583Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7543413Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:36.7544121Z #22 541.3 ptxas info : Compile time = 697.302 ms 2025-09-07T06:34:36.7547982Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7555864Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7560426Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7561522Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:36.7562452Z #22 541.3 ptxas info : Compile time = 658.918 ms 2025-09-07T06:34:36.7566936Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7599232Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7605060Z #22 541.3 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:34:36.7606591Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.7607809Z #22 541.3 ptxas info : Compile time = 730.130 ms 2025-09-07T06:34:36.7612907Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7622123Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7627206Z #22 541.3 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:34:36.7628686Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.7630014Z #22 541.3 ptxas info : Compile time = 670.393 ms 2025-09-07T06:34:36.7634845Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7644069Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7649145Z #22 541.3 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:34:36.7650636Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.7652061Z #22 541.3 ptxas info : Compile time = 589.399 ms 2025-09-07T06:34:36.7656949Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7665207Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7670051Z #22 541.3 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:34:36.7671474Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.7672740Z #22 541.3 ptxas info : Compile time = 542.428 ms 2025-09-07T06:34:36.7678015Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7686985Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7692199Z #22 541.3 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:34:36.7693547Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.7694721Z #22 541.3 ptxas info : Compile time = 612.043 ms 2025-09-07T06:34:36.7699751Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7708755Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7713796Z #22 541.3 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:34:36.7715489Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.7716694Z #22 541.3 ptxas info : Compile time = 562.614 ms 2025-09-07T06:34:36.7721818Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7730978Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7736225Z #22 541.3 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads 2025-09-07T06:34:36.7737642Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.7738873Z #22 541.3 ptxas info : Compile time = 917.031 ms 2025-09-07T06:34:36.7744006Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7854536Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7859779Z #22 541.3 24 bytes stack frame, 24 bytes spill stores, 24 bytes spill loads 2025-09-07T06:34:36.7861308Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.7862625Z #22 541.3 ptxas info : Compile time = 739.103 ms 2025-09-07T06:34:36.7867965Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7877265Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7882659Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7883770Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:36.7884805Z #22 541.3 ptxas info : Compile time = 965.558 ms 2025-09-07T06:34:36.7890631Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7900317Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7905726Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7906914Z #22 541.3 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:36.7907984Z #22 541.3 ptxas info : Compile time = 639.413 ms 2025-09-07T06:34:36.7913058Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7922366Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7927887Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7929107Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:36.7930095Z #22 541.3 ptxas info : Compile time = 953.079 ms 2025-09-07T06:34:36.7935449Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7944572Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7949935Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7951070Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:36.7952088Z #22 541.3 ptxas info : Compile time = 753.472 ms 2025-09-07T06:34:36.7957265Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7969281Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7974642Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7975805Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:36.7976824Z #22 541.3 ptxas info : Compile time = 988.707 ms 2025-09-07T06:34:36.7981910Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.7991180Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.7996320Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.7997462Z #22 541.3 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:36.7998747Z #22 541.3 ptxas info : Compile time = 673.658 ms 2025-09-07T06:34:36.8003689Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.8012822Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.8017801Z #22 541.3 24 bytes stack frame, 24 bytes spill stores, 28 bytes spill loads 2025-09-07T06:34:36.8019246Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.8020495Z #22 541.3 ptxas info : Compile time = 921.330 ms 2025-09-07T06:34:36.8025354Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.8034074Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.8038778Z #22 541.3 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:34:36.8040202Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.8041310Z #22 541.3 ptxas info : Compile time = 719.306 ms 2025-09-07T06:34:36.8043770Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.8048022Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:36.8050994Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.8052249Z #22 541.3 ptxas info : Used 84 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:34:36.8053215Z #22 541.3 ptxas info : Compile time = 32.452 ms 2025-09-07T06:34:36.8058337Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.8067588Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.8072934Z #22 541.3 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:34:36.8074318Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:36.8075533Z #22 541.3 ptxas info : Compile time = 936.737 ms 2025-09-07T06:34:36.8080068Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.8088371Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:36.8093229Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.8094365Z #22 541.3 ptxas info : Used 80 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:34:36.8095289Z #22 541.3 ptxas info : Compile time = 43.255 ms 2025-09-07T06:34:36.8099941Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.8108555Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:36.8113158Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.8114299Z #22 541.3 ptxas info : Used 69 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:34:36.8115222Z #22 541.3 ptxas info : Compile time = 26.445 ms 2025-09-07T06:34:36.8120309Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.8128888Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:36.8134193Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.8135375Z #22 541.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:34:36.8136515Z #22 541.3 ptxas info : Compile time = 642.997 ms 2025-09-07T06:34:36.8139238Z #22 541.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:36.8143596Z #22 541.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:36.8146223Z #22 541.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:36.8147373Z #22 541.3 ptxas info : Used 87 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:34:36.8148386Z #22 541.3 ptxas info : Compile time = 38.699 ms 2025-09-07T06:34:40.3808034Z #22 545.2 [39/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:40.3826869Z #22 545.2 ptxas info : 11 bytes gmem 2025-09-07T06:34:40.3831382Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.3839792Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.3844284Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.3845386Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.3846581Z #22 545.2 ptxas info : Compile time = 2.004 ms 2025-09-07T06:34:40.3851558Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.3858449Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.3862053Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.3862835Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.3863488Z #22 545.2 ptxas info : Compile time = 0.963 ms 2025-09-07T06:34:40.3867382Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.3874098Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.3877757Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.3878883Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.3879527Z #22 545.2 ptxas info : Compile time = 0.653 ms 2025-09-07T06:34:40.3883335Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.3890267Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.3894214Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.3894965Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.3895571Z #22 545.2 ptxas info : Compile time = 0.671 ms 2025-09-07T06:34:40.3899363Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.3906627Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.3910455Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.3911202Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.3911849Z #22 545.2 ptxas info : Compile time = 21.085 ms 2025-09-07T06:34:40.3915651Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.3922716Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.3926557Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.3927332Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.3927966Z #22 545.2 ptxas info : Compile time = 0.967 ms 2025-09-07T06:34:40.3932238Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.3939140Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.3942957Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.3943728Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.3944367Z #22 545.2 ptxas info : Compile time = 0.792 ms 2025-09-07T06:34:40.3947971Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.3954730Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.3958723Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.3959462Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.3960086Z #22 545.2 ptxas info : Compile time = 0.723 ms 2025-09-07T06:34:40.3963661Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.3970222Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.3973984Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.3974729Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.3975332Z #22 545.2 ptxas info : Compile time = 0.675 ms 2025-09-07T06:34:40.3979147Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.3986434Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.3990306Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.3991060Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.3991707Z #22 545.2 ptxas info : Compile time = 0.640 ms 2025-09-07T06:34:40.3995484Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.4002334Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4006096Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4007122Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.4007718Z #22 545.2 ptxas info : Compile time = 0.585 ms 2025-09-07T06:34:40.4011581Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.4018402Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4022246Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4023037Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.4023647Z #22 545.2 ptxas info : Compile time = 0.569 ms 2025-09-07T06:34:40.4027436Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.4034731Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4038517Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4039261Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.4039884Z #22 545.2 ptxas info : Compile time = 0.603 ms 2025-09-07T06:34:40.4043590Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.4151435Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4155507Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4156250Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.4156885Z #22 545.2 ptxas info : Compile time = 0.616 ms 2025-09-07T06:34:40.4160699Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.4168118Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4171992Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4172739Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.4173363Z #22 545.2 ptxas info : Compile time = 0.607 ms 2025-09-07T06:34:40.4177266Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.4184439Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4188465Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4189567Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.4190222Z #22 545.2 ptxas info : Compile time = 0.659 ms 2025-09-07T06:34:40.4194043Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.4200962Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4204763Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4205526Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.4206151Z #22 545.2 ptxas info : Compile time = 0.632 ms 2025-09-07T06:34:40.4210032Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:40.4217401Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4221258Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4221991Z #22 545.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:40.4222632Z #22 545.2 ptxas info : Compile time = 0.625 ms 2025-09-07T06:34:40.4223265Z #22 545.2 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:34:40.4226918Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4233535Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4237277Z #22 545.2 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:34:40.4238356Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:40.4239330Z #22 545.2 ptxas info : Compile time = 943.803 ms 2025-09-07T06:34:40.4243432Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4250569Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4255136Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4256352Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:40.4257255Z #22 545.2 ptxas info : Compile time = 873.424 ms 2025-09-07T06:34:40.4261705Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4269375Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4273347Z #22 545.2 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:34:40.4274370Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:40.4275276Z #22 545.2 ptxas info : Compile time = 2005.693 ms 2025-09-07T06:34:40.4279092Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4286109Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4290045Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4290890Z #22 545.2 ptxas info : Used 248 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:40.4291764Z #22 545.2 ptxas info : Compile time = 1551.959 ms 2025-09-07T06:34:40.4295912Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4302781Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4306653Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4307477Z #22 545.2 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:40.4308232Z #22 545.2 ptxas info : Compile time = 1683.742 ms 2025-09-07T06:34:40.4312075Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4319036Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4323059Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4323923Z #22 545.2 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:40.4324651Z #22 545.2 ptxas info : Compile time = 3007.854 ms 2025-09-07T06:34:40.4328431Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4335468Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4339283Z #22 545.2 128 bytes stack frame, 148 bytes spill stores, 316 bytes spill loads 2025-09-07T06:34:40.4340349Z #22 545.2 ptxas info : Used 255 registers, used 6 barriers, 128 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:34:40.4341273Z #22 545.2 ptxas info : Compile time = 1762.280 ms 2025-09-07T06:34:40.4344859Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4352011Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4355632Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4356451Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:40.4357194Z #22 545.2 ptxas info : Compile time = 1611.472 ms 2025-09-07T06:34:40.4360882Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4367344Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4371032Z #22 545.2 96 bytes stack frame, 132 bytes spill stores, 244 bytes spill loads 2025-09-07T06:34:40.4372221Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:40.4373469Z #22 545.2 ptxas info : Compile time = 3303.150 ms 2025-09-07T06:34:40.4377224Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4384012Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4387758Z #22 545.2 88 bytes stack frame, 104 bytes spill stores, 136 bytes spill loads 2025-09-07T06:34:40.4388813Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:40.4389765Z #22 545.2 ptxas info : Compile time = 1030.167 ms 2025-09-07T06:34:40.4393624Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4400888Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4404876Z #22 545.2 144 bytes stack frame, 164 bytes spill stores, 192 bytes spill loads 2025-09-07T06:34:40.4405943Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 144 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:40.4406879Z #22 545.2 ptxas info : Compile time = 1079.190 ms 2025-09-07T06:34:40.4410682Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4417871Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4421680Z #22 545.2 168 bytes stack frame, 220 bytes spill stores, 260 bytes spill loads 2025-09-07T06:34:40.4422726Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 168 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:40.4423668Z #22 545.2 ptxas info : Compile time = 2578.439 ms 2025-09-07T06:34:40.4427676Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4434482Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4438315Z #22 545.2 128 bytes stack frame, 168 bytes spill stores, 244 bytes spill loads 2025-09-07T06:34:40.4439394Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 128 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:40.4440326Z #22 545.2 ptxas info : Compile time = 2329.985 ms 2025-09-07T06:34:40.4444057Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4451535Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4455320Z #22 545.2 120 bytes stack frame, 176 bytes spill stores, 248 bytes spill loads 2025-09-07T06:34:40.4456368Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 120 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:40.4457272Z #22 545.2 ptxas info : Compile time = 2447.741 ms 2025-09-07T06:34:40.4460992Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4467889Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4471943Z #22 545.2 248 bytes stack frame, 196 bytes spill stores, 436 bytes spill loads 2025-09-07T06:34:40.4472986Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 248 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:40.4473886Z #22 545.2 ptxas info : Compile time = 5574.519 ms 2025-09-07T06:34:40.4479285Z #22 545.2 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi48EEES4_EEELi128EN7cutlass10bfloat16_tEfNS7_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EE3mmaINS_16FlashAttnFwdSm80ISB_NS_21CollectiveEpilogueFwdINS2_IJS4_S4_S5_EEENS2_IJNS3_ILi1EEESG_SG_EEES8_SA_Li128ELb1ELb1ELb0ELb0EEENS_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESR_EEESR_NS3_ILi16EEEEEENS2_IJNS2_IJSG_SR_EEENS3_ILi4EEENS3_ILi8EEEEEEEEEENS_7SoftmaxILi4ELi0EEEEEbRKNSB_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:34:40.4485193Z #22 545.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:40.4489285Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4496705Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4500675Z #22 545.2 176 bytes stack frame, 260 bytes spill stores, 656 bytes spill loads 2025-09-07T06:34:40.4501761Z #22 545.2 ptxas info : Used 255 registers, used 6 barriers, 176 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:34:40.4502710Z #22 545.2 ptxas info : Compile time = 2296.143 ms 2025-09-07T06:34:40.4506780Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.4513662Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.4517529Z #22 545.2 200 bytes stack frame, 276 bytes spill stores, 364 bytes spill loads 2025-09-07T06:34:40.4518578Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 200 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:40.5297635Z #22 545.2 ptxas info : Compile time = 2037.921 ms 2025-09-07T06:34:40.5301491Z #22 545.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:40.5308438Z #22 545.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:40.5312623Z #22 545.2 160 bytes stack frame, 240 bytes spill stores, 328 bytes spill loads 2025-09-07T06:34:40.5313667Z #22 545.2 ptxas info : Used 255 registers, used 2 barriers, 160 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:40.5314574Z #22 545.2 ptxas info : Compile time = 4845.255 ms 2025-09-07T06:34:55.4209319Z #22 560.2 [40/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:55.5707748Z #22 560.2 ptxas info : 11 bytes gmem 2025-09-07T06:34:55.5712364Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.5720998Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.5725924Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.5726791Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.5727567Z #22 560.2 ptxas info : Compile time = 2.163 ms 2025-09-07T06:34:55.5732505Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.5741452Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.5745959Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.5746865Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.5747627Z #22 560.2 ptxas info : Compile time = 1.047 ms 2025-09-07T06:34:55.5752693Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.5761359Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.5766091Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.5767110Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.5767898Z #22 560.2 ptxas info : Compile time = 0.723 ms 2025-09-07T06:34:55.5772935Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.5781415Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.5786053Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.5786940Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.5787706Z #22 560.2 ptxas info : Compile time = 0.664 ms 2025-09-07T06:34:55.5792152Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.5854684Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.5860236Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.5861223Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.5861996Z #22 560.2 ptxas info : Compile time = 0.651 ms 2025-09-07T06:34:55.5867072Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.5875272Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.5879826Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.5880767Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.5881592Z #22 560.2 ptxas info : Compile time = 0.694 ms 2025-09-07T06:34:55.5886083Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.5895706Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.5900863Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.5901906Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.5902749Z #22 560.2 ptxas info : Compile time = 0.641 ms 2025-09-07T06:34:55.5907829Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.5917098Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.5922201Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.5923229Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.5924246Z #22 560.2 ptxas info : Compile time = 0.619 ms 2025-09-07T06:34:55.5929195Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.5938192Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.5943017Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.5943950Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.5944792Z #22 560.2 ptxas info : Compile time = 0.664 ms 2025-09-07T06:34:55.5950020Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.5958840Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.5963486Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.5964797Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.5965648Z #22 560.2 ptxas info : Compile time = 0.601 ms 2025-09-07T06:34:55.5970775Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.5980239Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.5985377Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.5986410Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.5987275Z #22 560.2 ptxas info : Compile time = 0.623 ms 2025-09-07T06:34:55.5991894Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6000579Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:55.6005380Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6006399Z #22 560.2 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:34:55.6007279Z #22 560.2 ptxas info : Compile time = 40.874 ms 2025-09-07T06:34:55.6012638Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6021760Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6026803Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6027830Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6028643Z #22 560.2 ptxas info : Compile time = 1.075 ms 2025-09-07T06:34:55.6033831Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6043066Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6047900Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6149809Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6150848Z #22 560.2 ptxas info : Compile time = 0.768 ms 2025-09-07T06:34:55.6155726Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6164811Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6170108Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6171443Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6172295Z #22 560.2 ptxas info : Compile time = 0.646 ms 2025-09-07T06:34:55.6177342Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6186414Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6191525Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6192553Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6193401Z #22 560.2 ptxas info : Compile time = 1.058 ms 2025-09-07T06:34:55.6198397Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6207826Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6213029Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6214061Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6214901Z #22 560.2 ptxas info : Compile time = 0.664 ms 2025-09-07T06:34:55.6219921Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6229146Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6234112Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6235072Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6235883Z #22 560.2 ptxas info : Compile time = 0.696 ms 2025-09-07T06:34:55.6240903Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6250470Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6255655Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6256686Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6257543Z #22 560.2 ptxas info : Compile time = 0.643 ms 2025-09-07T06:34:55.6262377Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6271506Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6276544Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6277563Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6278426Z #22 560.2 ptxas info : Compile time = 0.710 ms 2025-09-07T06:34:55.6283693Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6292804Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6297652Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6298532Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6299380Z #22 560.2 ptxas info : Compile time = 0.668 ms 2025-09-07T06:34:55.6304175Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6313040Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6318286Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6319319Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6320197Z #22 560.2 ptxas info : Compile time = 0.640 ms 2025-09-07T06:34:55.6325104Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6333694Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6338528Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6339554Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6340390Z #22 560.2 ptxas info : Compile time = 0.670 ms 2025-09-07T06:34:55.6343022Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6347270Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:55.6350161Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6351208Z #22 560.2 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:34:55.6352047Z #22 560.2 ptxas info : Compile time = 36.276 ms 2025-09-07T06:34:55.6357322Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6366430Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6371606Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6372647Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6373503Z #22 560.2 ptxas info : Compile time = 1.008 ms 2025-09-07T06:34:55.6377972Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6386012Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:55.6390905Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6391936Z #22 560.2 ptxas info : Used 88 registers, used 1 barriers 2025-09-07T06:34:55.6392789Z #22 560.2 ptxas info : Compile time = 51.115 ms 2025-09-07T06:34:55.6397418Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6405721Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:55.6410313Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6411496Z #22 560.2 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:34:55.6412345Z #22 560.2 ptxas info : Compile time = 29.118 ms 2025-09-07T06:34:55.6417289Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6426524Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6431503Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6432538Z #22 560.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:55.6433354Z #22 560.2 ptxas info : Compile time = 0.985 ms 2025-09-07T06:34:55.6435825Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:55.6440132Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:55.6442736Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6443725Z #22 560.2 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:34:55.6444590Z #22 560.2 ptxas info : Compile time = 41.472 ms 2025-09-07T06:34:55.6445417Z #22 560.2 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:34:55.6450513Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6459620Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6464727Z #22 560.2 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:34:55.6466128Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6467398Z #22 560.2 ptxas info : Compile time = 660.607 ms 2025-09-07T06:34:55.6472567Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6481700Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6486820Z #22 560.2 24 bytes stack frame, 24 bytes spill stores, 32 bytes spill loads 2025-09-07T06:34:55.6488234Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6489487Z #22 560.2 ptxas info : Compile time = 587.194 ms 2025-09-07T06:34:55.6495058Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6504266Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6509516Z #22 560.2 40 bytes stack frame, 40 bytes spill stores, 44 bytes spill loads 2025-09-07T06:34:55.6510970Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6512253Z #22 560.2 ptxas info : Compile time = 692.867 ms 2025-09-07T06:34:55.6517205Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6526536Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6532072Z #22 560.2 40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:34:55.6533503Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6534777Z #22 560.2 ptxas info : Compile time = 619.963 ms 2025-09-07T06:34:55.6539975Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6549207Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6554364Z #22 560.2 32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:34:55.6555806Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6557075Z #22 560.2 ptxas info : Compile time = 749.951 ms 2025-09-07T06:34:55.6562506Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6572025Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6577213Z #22 560.2 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:34:55.6578638Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6579940Z #22 560.2 ptxas info : Compile time = 703.870 ms 2025-09-07T06:34:55.6585130Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6594610Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6599685Z #22 560.2 48 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:34:55.6601234Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6602523Z #22 560.2 ptxas info : Compile time = 785.521 ms 2025-09-07T06:34:55.6607466Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6616906Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6621975Z #22 560.2 32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:34:55.6623385Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6624631Z #22 560.2 ptxas info : Compile time = 704.481 ms 2025-09-07T06:34:55.6629421Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6637320Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6642354Z #22 560.2 64 bytes stack frame, 68 bytes spill stores, 84 bytes spill loads 2025-09-07T06:34:55.6643809Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6645061Z #22 560.2 ptxas info : Compile time = 758.514 ms 2025-09-07T06:34:55.6650331Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6659250Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6663975Z #22 560.2 40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:34:55.6665381Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6666668Z #22 560.2 ptxas info : Compile time = 665.298 ms 2025-09-07T06:34:55.6672048Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6681571Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6686632Z #22 560.2 48 bytes stack frame, 48 bytes spill stores, 56 bytes spill loads 2025-09-07T06:34:55.6687993Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6689279Z #22 560.2 ptxas info : Compile time = 768.884 ms 2025-09-07T06:34:55.6694134Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6702771Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:55.6707575Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6709033Z #22 560.2 ptxas info : Used 69 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:34:55.6710048Z #22 560.2 ptxas info : Compile time = 30.213 ms 2025-09-07T06:34:55.6715237Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6724628Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6729730Z #22 560.2 48 bytes stack frame, 48 bytes spill stores, 52 bytes spill loads 2025-09-07T06:34:55.6731200Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6732432Z #22 560.2 ptxas info : Compile time = 693.321 ms 2025-09-07T06:34:55.6737378Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6746666Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6751919Z #22 560.2 72 bytes stack frame, 68 bytes spill stores, 96 bytes spill loads 2025-09-07T06:34:55.6753346Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6754568Z #22 560.2 ptxas info : Compile time = 829.038 ms 2025-09-07T06:34:55.6759553Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6768551Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6773664Z #22 560.2 48 bytes stack frame, 44 bytes spill stores, 60 bytes spill loads 2025-09-07T06:34:55.6775054Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6776316Z #22 560.2 ptxas info : Compile time = 800.128 ms 2025-09-07T06:34:55.6781634Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6790682Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6795744Z #22 560.2 80 bytes stack frame, 80 bytes spill stores, 84 bytes spill loads 2025-09-07T06:34:55.6797181Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6798458Z #22 560.2 ptxas info : Compile time = 912.974 ms 2025-09-07T06:34:55.6803279Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6812294Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6817636Z #22 560.2 80 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads 2025-09-07T06:34:55.6819149Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6820463Z #22 560.2 ptxas info : Compile time = 764.345 ms 2025-09-07T06:34:55.6825519Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6834620Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6839483Z #22 560.2 64 bytes stack frame, 64 bytes spill stores, 88 bytes spill loads 2025-09-07T06:34:55.6840949Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6842153Z #22 560.2 ptxas info : Compile time = 922.598 ms 2025-09-07T06:34:55.6847113Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6856784Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6861810Z #22 560.2 72 bytes stack frame, 68 bytes spill stores, 84 bytes spill loads 2025-09-07T06:34:55.6863242Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6864509Z #22 560.2 ptxas info : Compile time = 810.570 ms 2025-09-07T06:34:55.6869623Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6878869Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6883980Z #22 560.2 88 bytes stack frame, 88 bytes spill stores, 92 bytes spill loads 2025-09-07T06:34:55.6885421Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6886878Z #22 560.2 ptxas info : Compile time = 968.814 ms 2025-09-07T06:34:55.6892063Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6901259Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6906401Z #22 560.2 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads 2025-09-07T06:34:55.6907859Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6909055Z #22 560.2 ptxas info : Compile time = 841.976 ms 2025-09-07T06:34:55.6913861Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6922440Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6927445Z #22 560.2 104 bytes stack frame, 104 bytes spill stores, 136 bytes spill loads 2025-09-07T06:34:55.6928913Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6930137Z #22 560.2 ptxas info : Compile time = 846.652 ms 2025-09-07T06:34:55.6935085Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6943768Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6948664Z #22 560.2 96 bytes stack frame, 96 bytes spill stores, 120 bytes spill loads 2025-09-07T06:34:55.6950307Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6951542Z #22 560.2 ptxas info : Compile time = 752.866 ms 2025-09-07T06:34:55.6954061Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6958651Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:55.6961313Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.6962473Z #22 560.2 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:34:55.6963451Z #22 560.2 ptxas info : Compile time = 33.315 ms 2025-09-07T06:34:55.6968465Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6977534Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.6982625Z #22 560.2 96 bytes stack frame, 92 bytes spill stores, 96 bytes spill loads 2025-09-07T06:34:55.6984066Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.6985339Z #22 560.2 ptxas info : Compile time = 957.283 ms 2025-09-07T06:34:55.6989984Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.6998513Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:55.7003223Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.7004349Z #22 560.2 ptxas info : Used 90 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:34:55.7005340Z #22 560.2 ptxas info : Compile time = 38.311 ms 2025-09-07T06:34:55.7010024Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.7018384Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:34:55.7022974Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.7024121Z #22 560.2 ptxas info : Used 55 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:34:55.7025233Z #22 560.2 ptxas info : Compile time = 21.597 ms 2025-09-07T06:34:55.7030301Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.7039127Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:55.7044160Z #22 560.2 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads 2025-09-07T06:34:55.7045604Z #22 560.2 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:34:55.7046816Z #22 560.2 ptxas info : Compile time = 766.473 ms 2025-09-07T06:34:55.7049179Z #22 560.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:55.7052959Z #22 560.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:34:55.7055511Z #22 560.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:55.7056640Z #22 560.2 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:34:55.7057646Z #22 560.2 ptxas info : Compile time = 36.690 ms 2025-09-07T06:34:59.4659535Z #22 564.2 [41/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:59.4679074Z #22 564.2 ptxas info : 11 bytes gmem 2025-09-07T06:34:59.4683894Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4692859Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4697590Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4698553Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4699376Z #22 564.2 ptxas info : Compile time = 2.188 ms 2025-09-07T06:34:59.4704071Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4711637Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4716339Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4717619Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4718443Z #22 564.2 ptxas info : Compile time = 21.226 ms 2025-09-07T06:34:59.4723228Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4732041Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4736849Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4737843Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4738677Z #22 564.2 ptxas info : Compile time = 1.055 ms 2025-09-07T06:34:59.4743810Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4753596Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4758717Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4759703Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4760521Z #22 564.2 ptxas info : Compile time = 0.661 ms 2025-09-07T06:34:59.4765588Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4774925Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4779937Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4780921Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4781752Z #22 564.2 ptxas info : Compile time = 0.564 ms 2025-09-07T06:34:59.4788743Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4797952Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4803024Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4804008Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4804849Z #22 564.2 ptxas info : Compile time = 0.517 ms 2025-09-07T06:34:59.4809960Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4819297Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4824840Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4825988Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4827248Z #22 564.2 ptxas info : Compile time = 0.544 ms 2025-09-07T06:34:59.4832115Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4840965Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4846060Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4847213Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4848110Z #22 564.2 ptxas info : Compile time = 0.525 ms 2025-09-07T06:34:59.4853519Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4862688Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4867674Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4868918Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4869882Z #22 564.2 ptxas info : Compile time = 0.546 ms 2025-09-07T06:34:59.4875050Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4884511Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4889720Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4890834Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4892346Z #22 564.2 ptxas info : Compile time = 0.508 ms 2025-09-07T06:34:59.4897535Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4906858Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4912282Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4913373Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4914333Z #22 564.2 ptxas info : Compile time = 0.532 ms 2025-09-07T06:34:59.4919649Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.4929203Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.4934745Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.4936062Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.4936975Z #22 564.2 ptxas info : Compile time = 0.501 ms 2025-09-07T06:34:59.4941943Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.5052464Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5057837Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.5058862Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.5060055Z #22 564.2 ptxas info : Compile time = 0.641 ms 2025-09-07T06:34:59.5065096Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.5074676Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5079685Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.5080833Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.5081706Z #22 564.2 ptxas info : Compile time = 0.586 ms 2025-09-07T06:34:59.5086909Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.5096040Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5101192Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.5102319Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.5103265Z #22 564.2 ptxas info : Compile time = 0.611 ms 2025-09-07T06:34:59.5108769Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.5118270Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5123567Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.5124738Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.5125659Z #22 564.2 ptxas info : Compile time = 0.643 ms 2025-09-07T06:34:59.5130017Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.5138384Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5143313Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.5144432Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.5145408Z #22 564.2 ptxas info : Compile time = 0.659 ms 2025-09-07T06:34:59.5150455Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:59.5158703Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5163425Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.5164453Z #22 564.2 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:59.5165329Z #22 564.2 ptxas info : Compile time = 0.584 ms 2025-09-07T06:34:59.5166206Z #22 564.2 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:34:59.5171204Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5179507Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5184226Z #22 564.2 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:34:59.5185787Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:59.5187089Z #22 564.2 ptxas info : Compile time = 643.650 ms 2025-09-07T06:34:59.5191805Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5200047Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5205042Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.5206439Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:59.5207430Z #22 564.2 ptxas info : Compile time = 578.963 ms 2025-09-07T06:34:59.5211593Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5219731Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5224356Z #22 564.2 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:34:59.5225757Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:59.5227222Z #22 564.2 ptxas info : Compile time = 1348.367 ms 2025-09-07T06:34:59.5232002Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5240930Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5245910Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.5247104Z #22 564.2 ptxas info : Used 248 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:59.5248089Z #22 564.2 ptxas info : Compile time = 1076.324 ms 2025-09-07T06:34:59.5253546Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5262157Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5266853Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.5268504Z #22 564.2 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:59.5269730Z #22 564.2 ptxas info : Compile time = 1119.613 ms 2025-09-07T06:34:59.5274661Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5283408Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5288222Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.5289417Z #22 564.2 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:59.5290590Z #22 564.2 ptxas info : Compile time = 2333.595 ms 2025-09-07T06:34:59.5295636Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5304501Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5309388Z #22 564.2 128 bytes stack frame, 148 bytes spill stores, 316 bytes spill loads 2025-09-07T06:34:59.5310684Z #22 564.2 ptxas info : Used 255 registers, used 6 barriers, 128 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:34:59.5311828Z #22 564.2 ptxas info : Compile time = 2018.819 ms 2025-09-07T06:34:59.5316080Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5324026Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5328927Z #22 564.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:59.5330264Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:59.5331484Z #22 564.2 ptxas info : Compile time = 1272.869 ms 2025-09-07T06:34:59.5335631Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5343825Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5348275Z #22 564.2 96 bytes stack frame, 132 bytes spill stores, 244 bytes spill loads 2025-09-07T06:34:59.5350075Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:59.5351514Z #22 564.2 ptxas info : Compile time = 3663.004 ms 2025-09-07T06:34:59.5356189Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5364359Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5369678Z #22 564.2 88 bytes stack frame, 104 bytes spill stores, 136 bytes spill loads 2025-09-07T06:34:59.5371716Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:59.5372964Z #22 564.2 ptxas info : Compile time = 1292.532 ms 2025-09-07T06:34:59.5377781Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5386611Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5391705Z #22 564.2 144 bytes stack frame, 164 bytes spill stores, 192 bytes spill loads 2025-09-07T06:34:59.5393494Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 144 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:59.5394941Z #22 564.2 ptxas info : Compile time = 1391.278 ms 2025-09-07T06:34:59.5400032Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5409062Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5414143Z #22 564.2 168 bytes stack frame, 220 bytes spill stores, 260 bytes spill loads 2025-09-07T06:34:59.5415544Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 168 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:59.5417161Z #22 564.2 ptxas info : Compile time = 3102.346 ms 2025-09-07T06:34:59.5421117Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5429814Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5434655Z #22 564.2 128 bytes stack frame, 168 bytes spill stores, 244 bytes spill loads 2025-09-07T06:34:59.5436134Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 128 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:59.5437527Z #22 564.2 ptxas info : Compile time = 2613.112 ms 2025-09-07T06:34:59.5441805Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5449523Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5454592Z #22 564.2 120 bytes stack frame, 176 bytes spill stores, 248 bytes spill loads 2025-09-07T06:34:59.5456078Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 120 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:59.5457345Z #22 564.2 ptxas info : Compile time = 2734.831 ms 2025-09-07T06:34:59.5462295Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5470594Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5475298Z #22 564.2 168 bytes stack frame, 228 bytes spill stores, 340 bytes spill loads 2025-09-07T06:34:59.5476874Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 168 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:59.5478146Z #22 564.2 ptxas info : Compile time = 5216.590 ms 2025-09-07T06:34:59.5482965Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5492565Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5497852Z #22 564.2 176 bytes stack frame, 260 bytes spill stores, 656 bytes spill loads 2025-09-07T06:34:59.5499375Z #22 564.2 ptxas info : Used 255 registers, used 6 barriers, 176 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:34:59.5500762Z #22 564.2 ptxas info : Compile time = 2649.399 ms 2025-09-07T06:34:59.5506246Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.5515080Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.5520258Z #22 564.2 200 bytes stack frame, 276 bytes spill stores, 364 bytes spill loads 2025-09-07T06:34:59.5521808Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 200 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:59.5523175Z #22 564.2 ptxas info : Compile time = 2435.866 ms 2025-09-07T06:34:59.6200226Z #22 564.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:59.6209356Z #22 564.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:59.6214745Z #22 564.2 160 bytes stack frame, 240 bytes spill stores, 328 bytes spill loads 2025-09-07T06:34:59.6216353Z #22 564.2 ptxas info : Used 255 registers, used 2 barriers, 160 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:59.6217529Z #22 564.2 ptxas info : Compile time = 5077.059 ms 2025-09-07T06:35:06.3437753Z #22 571.1 [42/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:35:06.5104886Z #22 571.1 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:35:06.5110244Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.5120428Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5125912Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5127071Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:35:06.5128515Z #22 571.1 ptxas info : Compile time = 1.705 ms 2025-09-07T06:35:06.5144770Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.5155807Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5161744Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5162917Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:35:06.5163899Z #22 571.1 ptxas info : Compile time = 0.825 ms 2025-09-07T06:35:06.5169853Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.5181333Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5187255Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5188407Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:35:06.5189397Z #22 571.1 ptxas info : Compile time = 0.740 ms 2025-09-07T06:35:06.5194796Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.5205016Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5210617Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5211935Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:35:06.5213204Z #22 571.1 ptxas info : Compile time = 0.493 ms 2025-09-07T06:35:06.5219165Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.5229891Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5235782Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5236906Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:35:06.5237874Z #22 571.1 ptxas info : Compile time = 0.493 ms 2025-09-07T06:35:06.5243824Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.5255441Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5261454Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5262585Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:35:06.5263587Z #22 571.1 ptxas info : Compile time = 0.475 ms 2025-09-07T06:35:06.5275713Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.5286138Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5291959Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5293135Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:35:06.5294443Z #22 571.1 ptxas info : Compile time = 0.470 ms 2025-09-07T06:35:06.5300359Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.5311340Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5317020Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5318142Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:35:06.5319107Z #22 571.1 ptxas info : Compile time = 0.456 ms 2025-09-07T06:35:06.5325026Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.5336315Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5342263Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5343422Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:35:06.5344435Z #22 571.1 ptxas info : Compile time = 0.459 ms 2025-09-07T06:35:06.5345169Z #22 571.1 ptxas info : 11 bytes gmem 2025-09-07T06:35:06.5350709Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5360804Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5366238Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5367539Z #22 571.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:35:06.5368387Z #22 571.1 ptxas info : Compile time = 573.078 ms 2025-09-07T06:35:06.5374371Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5385259Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5391305Z #22 571.1 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:35:06.5392608Z #22 571.1 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:35:06.5393740Z #22 571.1 ptxas info : Compile time = 856.526 ms 2025-09-07T06:35:06.5399795Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5411407Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5417330Z #22 571.1 40 bytes stack frame, 88 bytes spill stores, 108 bytes spill loads 2025-09-07T06:35:06.5418633Z #22 571.1 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:35:06.5419728Z #22 571.1 ptxas info : Compile time = 2019.873 ms 2025-09-07T06:35:06.5425204Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5435345Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5440924Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5442160Z #22 571.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:35:06.5443030Z #22 571.1 ptxas info : Compile time = 1146.077 ms 2025-09-07T06:35:06.5449032Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5459862Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5465883Z #22 571.1 16 bytes stack frame, 40 bytes spill stores, 32 bytes spill loads 2025-09-07T06:35:06.5467165Z #22 571.1 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:35:06.5468306Z #22 571.1 ptxas info : Compile time = 1496.818 ms 2025-09-07T06:35:06.5474326Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5485693Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5491746Z #22 571.1 64 bytes stack frame, 92 bytes spill stores, 120 bytes spill loads 2025-09-07T06:35:06.5493054Z #22 571.1 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:35:06.5494180Z #22 571.1 ptxas info : Compile time = 3123.466 ms 2025-09-07T06:35:06.5499777Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5509716Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5515449Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5517184Z #22 571.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:35:06.5518074Z #22 571.1 ptxas info : Compile time = 816.474 ms 2025-09-07T06:35:06.5524039Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5535109Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5541152Z #22 571.1 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:35:06.5542442Z #22 571.1 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:35:06.5543564Z #22 571.1 ptxas info : Compile time = 1120.680 ms 2025-09-07T06:35:06.5553458Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5564761Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5570696Z #22 571.1 40 bytes stack frame, 96 bytes spill stores, 116 bytes spill loads 2025-09-07T06:35:06.5572119Z #22 571.1 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:35:06.5573250Z #22 571.1 ptxas info : Compile time = 2527.170 ms 2025-09-07T06:35:06.5592728Z #22 571.1 [43/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:35:06.5612713Z #22 571.1 ptxas info : 11 bytes gmem 2025-09-07T06:35:06.5617678Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5626808Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5631824Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5632873Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5633739Z #22 571.1 ptxas info : Compile time = 2.181 ms 2025-09-07T06:35:06.5638969Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5648274Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5653731Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5654756Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5655629Z #22 571.1 ptxas info : Compile time = 1.033 ms 2025-09-07T06:35:06.5660369Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5669575Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5675017Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5676075Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5676938Z #22 571.1 ptxas info : Compile time = 0.716 ms 2025-09-07T06:35:06.5682011Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5691531Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5696379Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5697324Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5698140Z #22 571.1 ptxas info : Compile time = 0.673 ms 2025-09-07T06:35:06.5703144Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5712628Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5717652Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5718653Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5719521Z #22 571.1 ptxas info : Compile time = 0.668 ms 2025-09-07T06:35:06.5724580Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5733918Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5738974Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5739930Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5740711Z #22 571.1 ptxas info : Compile time = 0.705 ms 2025-09-07T06:35:06.5745968Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5755087Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5760214Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5761243Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5762103Z #22 571.1 ptxas info : Compile time = 0.662 ms 2025-09-07T06:35:06.5767725Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5776857Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5781887Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5783206Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5784052Z #22 571.1 ptxas info : Compile time = 0.630 ms 2025-09-07T06:35:06.5788991Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5797687Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5802513Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5803556Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5804419Z #22 571.1 ptxas info : Compile time = 0.673 ms 2025-09-07T06:35:06.5809274Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5818394Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5823438Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5824449Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5825318Z #22 571.1 ptxas info : Compile time = 0.615 ms 2025-09-07T06:35:06.5830424Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5839538Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5844622Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5845638Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5846458Z #22 571.1 ptxas info : Compile time = 0.622 ms 2025-09-07T06:35:06.5851336Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5859918Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:35:06.5864601Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5865643Z #22 571.1 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:35:06.5866521Z #22 571.1 ptxas info : Compile time = 39.976 ms 2025-09-07T06:35:06.5871659Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5880801Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5885909Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5887104Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5888062Z #22 571.1 ptxas info : Compile time = 1.006 ms 2025-09-07T06:35:06.5893259Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5902357Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5907208Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5908252Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5909110Z #22 571.1 ptxas info : Compile time = 0.810 ms 2025-09-07T06:35:06.5914114Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5923210Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5928271Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5929410Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5930266Z #22 571.1 ptxas info : Compile time = 0.695 ms 2025-09-07T06:35:06.5935392Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5944488Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5951446Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5952510Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5953360Z #22 571.1 ptxas info : Compile time = 0.668 ms 2025-09-07T06:35:06.5958048Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5967208Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5972357Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5973350Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5974203Z #22 571.1 ptxas info : Compile time = 0.671 ms 2025-09-07T06:35:06.5979106Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.5987948Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.5992168Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.5993045Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.5993778Z #22 571.1 ptxas info : Compile time = 0.656 ms 2025-09-07T06:35:06.5998654Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.6007803Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6012913Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6013955Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.6014826Z #22 571.1 ptxas info : Compile time = 0.643 ms 2025-09-07T06:35:06.6020231Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.6029202Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6034277Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6035481Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.6036340Z #22 571.1 ptxas info : Compile time = 0.792 ms 2025-09-07T06:35:06.6041345Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.6050196Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6055176Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6056221Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.6057075Z #22 571.1 ptxas info : Compile time = 0.624 ms 2025-09-07T06:35:06.6061944Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.6070672Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6075827Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6076872Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.6077735Z #22 571.1 ptxas info : Compile time = 0.636 ms 2025-09-07T06:35:06.6082526Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.6090969Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6095784Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6096809Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.6097640Z #22 571.1 ptxas info : Compile time = 0.674 ms 2025-09-07T06:35:06.6100137Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.6104322Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:35:06.6107213Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6108224Z #22 571.1 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:35:06.6109076Z #22 571.1 ptxas info : Compile time = 35.583 ms 2025-09-07T06:35:06.6113985Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.6122966Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6127799Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6128800Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.6129650Z #22 571.1 ptxas info : Compile time = 0.981 ms 2025-09-07T06:35:06.6134245Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.6142536Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:35:06.6147075Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6148079Z #22 571.1 ptxas info : Used 88 registers, used 1 barriers 2025-09-07T06:35:06.6149124Z #22 571.1 ptxas info : Compile time = 49.253 ms 2025-09-07T06:35:06.6153402Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.6161673Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:35:06.6166250Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6167297Z #22 571.1 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:35:06.6168125Z #22 571.1 ptxas info : Compile time = 29.652 ms 2025-09-07T06:35:06.6173148Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.6182430Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6187397Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6188436Z #22 571.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:06.6189287Z #22 571.1 ptxas info : Compile time = 0.990 ms 2025-09-07T06:35:06.6191837Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:06.6195876Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:35:06.6198490Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6199539Z #22 571.1 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:35:06.6200390Z #22 571.1 ptxas info : Compile time = 41.975 ms 2025-09-07T06:35:06.6201215Z #22 571.1 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:35:06.6206277Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6215836Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6220889Z #22 571.1 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:35:06.6222308Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6223578Z #22 571.1 ptxas info : Compile time = 708.816 ms 2025-09-07T06:35:06.6228688Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6237912Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6243063Z #22 571.1 24 bytes stack frame, 24 bytes spill stores, 32 bytes spill loads 2025-09-07T06:35:06.6244616Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6245895Z #22 571.1 ptxas info : Compile time = 660.573 ms 2025-09-07T06:35:06.6251397Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6260537Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6265463Z #22 571.1 40 bytes stack frame, 40 bytes spill stores, 44 bytes spill loads 2025-09-07T06:35:06.6266929Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6268180Z #22 571.1 ptxas info : Compile time = 742.317 ms 2025-09-07T06:35:06.6273726Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6283059Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6288065Z #22 571.1 40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:35:06.6289457Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6290722Z #22 571.1 ptxas info : Compile time = 664.418 ms 2025-09-07T06:35:06.6295858Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6305021Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6310129Z #22 571.1 32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:35:06.6311525Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6312748Z #22 571.1 ptxas info : Compile time = 764.523 ms 2025-09-07T06:35:06.6317777Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6327173Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6332405Z #22 571.1 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:35:06.6333774Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6335010Z #22 571.1 ptxas info : Compile time = 694.043 ms 2025-09-07T06:35:06.6340073Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6349581Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6354688Z #22 571.1 48 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:35:06.6356072Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6357524Z #22 571.1 ptxas info : Compile time = 775.475 ms 2025-09-07T06:35:06.6362634Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6372016Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6376971Z #22 571.1 32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:35:06.6378435Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6379683Z #22 571.1 ptxas info : Compile time = 717.337 ms 2025-09-07T06:35:06.6384476Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6393457Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6398489Z #22 571.1 64 bytes stack frame, 68 bytes spill stores, 84 bytes spill loads 2025-09-07T06:35:06.6399921Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6401169Z #22 571.1 ptxas info : Compile time = 760.901 ms 2025-09-07T06:35:06.6406005Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6414186Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6419073Z #22 571.1 40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:35:06.6420471Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6421710Z #22 571.1 ptxas info : Compile time = 670.941 ms 2025-09-07T06:35:06.6426871Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6435852Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6440946Z #22 571.1 48 bytes stack frame, 48 bytes spill stores, 56 bytes spill loads 2025-09-07T06:35:06.6442354Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6443627Z #22 571.1 ptxas info : Compile time = 742.405 ms 2025-09-07T06:35:06.6448290Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6457024Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:35:06.6461912Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6463060Z #22 571.1 ptxas info : Used 69 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:35:06.6464046Z #22 571.1 ptxas info : Compile time = 26.799 ms 2025-09-07T06:35:06.6469134Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6478219Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6483303Z #22 571.1 48 bytes stack frame, 48 bytes spill stores, 52 bytes spill loads 2025-09-07T06:35:06.6484733Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6486003Z #22 571.1 ptxas info : Compile time = 691.949 ms 2025-09-07T06:35:06.6491146Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6500348Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6505410Z #22 571.1 72 bytes stack frame, 68 bytes spill stores, 96 bytes spill loads 2025-09-07T06:35:06.6506824Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6508084Z #22 571.1 ptxas info : Compile time = 1073.312 ms 2025-09-07T06:35:06.6513029Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6522311Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6527270Z #22 571.1 48 bytes stack frame, 44 bytes spill stores, 60 bytes spill loads 2025-09-07T06:35:06.6528673Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6529881Z #22 571.1 ptxas info : Compile time = 836.403 ms 2025-09-07T06:35:06.6535061Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6543347Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6548075Z #22 571.1 80 bytes stack frame, 80 bytes spill stores, 84 bytes spill loads 2025-09-07T06:35:06.6549550Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6550616Z #22 571.1 ptxas info : Compile time = 1007.612 ms 2025-09-07T06:35:06.6555342Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6564091Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6569099Z #22 571.1 80 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads 2025-09-07T06:35:06.6570812Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6572206Z #22 571.1 ptxas info : Compile time = 833.852 ms 2025-09-07T06:35:06.6577169Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6586059Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6591002Z #22 571.1 64 bytes stack frame, 64 bytes spill stores, 88 bytes spill loads 2025-09-07T06:35:06.6592408Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6593611Z #22 571.1 ptxas info : Compile time = 997.305 ms 2025-09-07T06:35:06.6598335Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6607412Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6612441Z #22 571.1 72 bytes stack frame, 68 bytes spill stores, 84 bytes spill loads 2025-09-07T06:35:06.6613856Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6615110Z #22 571.1 ptxas info : Compile time = 866.551 ms 2025-09-07T06:35:06.6620180Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6628980Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6633910Z #22 571.1 88 bytes stack frame, 88 bytes spill stores, 92 bytes spill loads 2025-09-07T06:35:06.6635304Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6636560Z #22 571.1 ptxas info : Compile time = 987.006 ms 2025-09-07T06:35:06.6641497Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6651350Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6656367Z #22 571.1 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads 2025-09-07T06:35:06.6657858Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6659162Z #22 571.1 ptxas info : Compile time = 841.652 ms 2025-09-07T06:35:06.6663811Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6672150Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6677090Z #22 571.1 104 bytes stack frame, 104 bytes spill stores, 136 bytes spill loads 2025-09-07T06:35:06.6678524Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6679751Z #22 571.1 ptxas info : Compile time = 899.872 ms 2025-09-07T06:35:06.6684126Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6692590Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6697434Z #22 571.1 96 bytes stack frame, 96 bytes spill stores, 120 bytes spill loads 2025-09-07T06:35:06.6698869Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6700059Z #22 571.1 ptxas info : Compile time = 781.511 ms 2025-09-07T06:35:06.6702565Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6705855Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:35:06.6708396Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6709537Z #22 571.1 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:35:06.6710828Z #22 571.1 ptxas info : Compile time = 33.236 ms 2025-09-07T06:35:06.6715646Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6724382Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6729182Z #22 571.1 96 bytes stack frame, 92 bytes spill stores, 96 bytes spill loads 2025-09-07T06:35:06.6730540Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6731921Z #22 571.1 ptxas info : Compile time = 910.733 ms 2025-09-07T06:35:06.6736492Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6744760Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:35:06.6749493Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6750612Z #22 571.1 ptxas info : Used 90 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:35:06.6751597Z #22 571.1 ptxas info : Compile time = 35.164 ms 2025-09-07T06:35:06.6756172Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6764406Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:35:06.6769052Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6770675Z #22 571.1 ptxas info : Used 55 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:35:06.6771811Z #22 571.1 ptxas info : Compile time = 20.856 ms 2025-09-07T06:35:06.6777117Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6786147Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:06.6790812Z #22 571.1 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads 2025-09-07T06:35:06.6792167Z #22 571.1 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:35:06.6793370Z #22 571.1 ptxas info : Compile time = 780.540 ms 2025-09-07T06:35:06.6795861Z #22 571.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:06.6800001Z #22 571.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:35:06.6802676Z #22 571.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:06.6803798Z #22 571.1 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:35:06.6804825Z #22 571.1 ptxas info : Compile time = 36.510 ms 2025-09-07T06:35:26.6613948Z #22 591.4 [44/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:35:26.6627909Z #22 591.4 ptxas info : 11 bytes gmem 2025-09-07T06:35:26.6631549Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6639256Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6643595Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6644492Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6645258Z #22 591.4 ptxas info : Compile time = 2.355 ms 2025-09-07T06:35:26.6649885Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6657819Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6662512Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6663613Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6664454Z #22 591.4 ptxas info : Compile time = 1.099 ms 2025-09-07T06:35:26.6668757Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6676762Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6681401Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6682425Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6683251Z #22 591.4 ptxas info : Compile time = 0.765 ms 2025-09-07T06:35:26.6687877Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6696929Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6719186Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6720011Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6720736Z #22 591.4 ptxas info : Compile time = 0.749 ms 2025-09-07T06:35:26.6725308Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6733741Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6738714Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6739655Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6740403Z #22 591.4 ptxas info : Compile time = 0.926 ms 2025-09-07T06:35:26.6745089Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6754252Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6758795Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6759766Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6760525Z #22 591.4 ptxas info : Compile time = 0.620 ms 2025-09-07T06:35:26.6764903Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6773351Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6777991Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6778917Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6779671Z #22 591.4 ptxas info : Compile time = 0.625 ms 2025-09-07T06:35:26.6784547Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6792303Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6796848Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6797822Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6798561Z #22 591.4 ptxas info : Compile time = 0.643 ms 2025-09-07T06:35:26.6804311Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6812606Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6817671Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6818554Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6819194Z #22 591.4 ptxas info : Compile time = 0.641 ms 2025-09-07T06:35:26.6823703Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6832165Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6836707Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6837584Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6838373Z #22 591.4 ptxas info : Compile time = 0.625 ms 2025-09-07T06:35:26.6843222Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6852125Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6856979Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6857815Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6858591Z #22 591.4 ptxas info : Compile time = 0.617 ms 2025-09-07T06:35:26.6863332Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6871531Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6876504Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6877424Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6878150Z #22 591.4 ptxas info : Compile time = 0.607 ms 2025-09-07T06:35:26.6882426Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6890881Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6894671Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6895385Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6895978Z #22 591.4 ptxas info : Compile time = 0.620 ms 2025-09-07T06:35:26.6899473Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6906223Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6909732Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6910431Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6911023Z #22 591.4 ptxas info : Compile time = 0.606 ms 2025-09-07T06:35:26.6914500Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6920856Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6924373Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6925083Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6925657Z #22 591.4 ptxas info : Compile time = 0.650 ms 2025-09-07T06:35:26.6929487Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6936357Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6939989Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6940682Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6941273Z #22 591.4 ptxas info : Compile time = 0.663 ms 2025-09-07T06:35:26.6944742Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6959067Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6962997Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6963716Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6964297Z #22 591.4 ptxas info : Compile time = 0.592 ms 2025-09-07T06:35:26.6967753Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.6974256Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6977752Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6978440Z #22 591.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.6979026Z #22 591.4 ptxas info : Compile time = 0.590 ms 2025-09-07T06:35:26.6979591Z #22 591.4 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:35:26.6982931Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.6989260Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.6992541Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.6993343Z #22 591.4 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.6994035Z #22 591.4 ptxas info : Compile time = 1013.997 ms 2025-09-07T06:35:26.6997390Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7003455Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7006808Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.7007608Z #22 591.4 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.7008330Z #22 591.4 ptxas info : Compile time = 1181.562 ms 2025-09-07T06:35:26.7012148Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7018168Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7021492Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.7022297Z #22 591.4 ptxas info : Used 253 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.7022982Z #22 591.4 ptxas info : Compile time = 2232.952 ms 2025-09-07T06:35:26.7026488Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7032874Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7036623Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.7037398Z #22 591.4 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.7038085Z #22 591.4 ptxas info : Compile time = 1913.655 ms 2025-09-07T06:35:26.7041580Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7047969Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7052094Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.7126985Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.7127815Z #22 591.4 ptxas info : Compile time = 2526.346 ms 2025-09-07T06:35:26.7132616Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7139008Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7142510Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.7143297Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.7144034Z #22 591.4 ptxas info : Compile time = 4143.438 ms 2025-09-07T06:35:26.7147531Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7154239Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7158139Z #22 591.4 120 bytes stack frame, 148 bytes spill stores, 256 bytes spill loads 2025-09-07T06:35:26.7159159Z #22 591.4 ptxas info : Used 255 registers, used 6 barriers, 120 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:35:26.7160011Z #22 591.4 ptxas info : Compile time = 2479.206 ms 2025-09-07T06:35:26.7163829Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7169828Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7173292Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.7174119Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.7174809Z #22 591.4 ptxas info : Compile time = 2434.518 ms 2025-09-07T06:35:26.7178794Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7185296Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7188643Z #22 591.4 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:35:26.7189597Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.7190482Z #22 591.4 ptxas info : Compile time = 4754.086 ms 2025-09-07T06:35:26.7193976Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7200288Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7203836Z #22 591.4 120 bytes stack frame, 148 bytes spill stores, 160 bytes spill loads 2025-09-07T06:35:26.7204842Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 120 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.7205934Z #22 591.4 ptxas info : Compile time = 1297.953 ms 2025-09-07T06:35:26.7209647Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7216156Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7219647Z #22 591.4 40 bytes stack frame, 52 bytes spill stores, 68 bytes spill loads 2025-09-07T06:35:26.7220642Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.7221515Z #22 591.4 ptxas info : Compile time = 1488.514 ms 2025-09-07T06:35:26.7224976Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7231556Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7235077Z #22 591.4 88 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads 2025-09-07T06:35:26.7236038Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.7236893Z #22 591.4 ptxas info : Compile time = 3137.895 ms 2025-09-07T06:35:26.7240358Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7246672Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7250604Z #22 591.4 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:35:26.7251704Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.7252568Z #22 591.4 ptxas info : Compile time = 2660.204 ms 2025-09-07T06:35:26.7256295Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7262697Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7266221Z #22 591.4 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:35:26.7267215Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.7268090Z #22 591.4 ptxas info : Compile time = 2996.269 ms 2025-09-07T06:35:26.7271584Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7277916Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7282734Z #22 591.4 224 bytes stack frame, 164 bytes spill stores, 628 bytes spill loads 2025-09-07T06:35:26.7283759Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 224 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.7284636Z #22 591.4 ptxas info : Compile time = 6320.672 ms 2025-09-07T06:35:26.7289606Z #22 591.4 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi48EEES4_EEELi128EN7cutlass6half_tEfNS7_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EE3mmaINS_16FlashAttnFwdSm80ISB_NS_21CollectiveEpilogueFwdINS2_IJS4_S4_S5_EEENS2_IJNS3_ILi1EEESG_SG_EEES8_SA_Li128ELb1ELb1ELb0ELb0EEENS_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESR_EEESR_NS3_ILi16EEEEEENS2_IJNS2_IJSG_SR_EEENS3_ILi4EEENS3_ILi8EEEEEEEEEENS_7SoftmaxILi4ELi0EEEEEbRKNSB_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:35:26.7294977Z #22 591.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.7298830Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7305660Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7309750Z #22 591.4 136 bytes stack frame, 172 bytes spill stores, 312 bytes spill loads 2025-09-07T06:35:26.7310770Z #22 591.4 ptxas info : Used 255 registers, used 6 barriers, 136 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:35:26.7311655Z #22 591.4 ptxas info : Compile time = 2549.476 ms 2025-09-07T06:35:26.7315156Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7321512Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7325044Z #22 591.4 112 bytes stack frame, 176 bytes spill stores, 212 bytes spill loads 2025-09-07T06:35:26.7326040Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.7326921Z #22 591.4 ptxas info : Compile time = 2439.136 ms 2025-09-07T06:35:26.7330677Z #22 591.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.7337753Z #22 591.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.7341266Z #22 591.4 176 bytes stack frame, 284 bytes spill stores, 320 bytes spill loads 2025-09-07T06:35:26.7342289Z #22 591.4 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.7343167Z #22 591.4 ptxas info : Compile time = 5068.338 ms 2025-09-07T06:35:26.8779409Z #22 591.7 [45/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:35:26.8793443Z #22 591.7 ptxas info : 11 bytes gmem 2025-09-07T06:35:26.8796829Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.8802951Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.8806661Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.8807382Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.8807958Z #22 591.7 ptxas info : Compile time = 2.015 ms 2025-09-07T06:35:26.8811500Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.8817607Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.8820992Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.8821690Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.8822283Z #22 591.7 ptxas info : Compile time = 21.104 ms 2025-09-07T06:35:26.8825654Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.8831995Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.8835343Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.8836060Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.8836631Z #22 591.7 ptxas info : Compile time = 1.018 ms 2025-09-07T06:35:26.8840190Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.8940287Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.8943908Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.8944644Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.8945282Z #22 591.7 ptxas info : Compile time = 0.926 ms 2025-09-07T06:35:26.8949521Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.8955969Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.8959544Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.8960243Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.8960856Z #22 591.7 ptxas info : Compile time = 1.013 ms 2025-09-07T06:35:26.8968329Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.8975049Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.8978909Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.8979613Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.8980188Z #22 591.7 ptxas info : Compile time = 0.693 ms 2025-09-07T06:35:26.8983713Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.8990119Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.8993631Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.8994323Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.8994914Z #22 591.7 ptxas info : Compile time = 0.674 ms 2025-09-07T06:35:26.8998269Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.9004576Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9007950Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9008650Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.9009220Z #22 591.7 ptxas info : Compile time = 0.637 ms 2025-09-07T06:35:26.9012751Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.9018850Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9022223Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9023106Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.9023769Z #22 591.7 ptxas info : Compile time = 0.639 ms 2025-09-07T06:35:26.9027251Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.9033643Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9037146Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9037864Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.9038446Z #22 591.7 ptxas info : Compile time = 0.661 ms 2025-09-07T06:35:26.9041948Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.9048537Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9052510Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9053220Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.9053808Z #22 591.7 ptxas info : Compile time = 0.615 ms 2025-09-07T06:35:26.9057315Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.9063891Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9067396Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9068090Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.9068686Z #22 591.7 ptxas info : Compile time = 0.631 ms 2025-09-07T06:35:26.9072216Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.9078880Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9082397Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9083111Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.9083702Z #22 591.7 ptxas info : Compile time = 0.615 ms 2025-09-07T06:35:26.9087241Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.9093796Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9097326Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9098016Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.9098910Z #22 591.7 ptxas info : Compile time = 0.608 ms 2025-09-07T06:35:26.9135298Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.9144101Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9149391Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9150402Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.9151142Z #22 591.7 ptxas info : Compile time = 0.643 ms 2025-09-07T06:35:26.9155833Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.9165160Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9170091Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9171229Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.9172027Z #22 591.7 ptxas info : Compile time = 0.645 ms 2025-09-07T06:35:26.9176895Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.9185399Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9190258Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9191235Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.9192062Z #22 591.7 ptxas info : Compile time = 0.604 ms 2025-09-07T06:35:26.9197319Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:35:26.9206081Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9210903Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9212064Z #22 591.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:35:26.9212826Z #22 591.7 ptxas info : Compile time = 0.636 ms 2025-09-07T06:35:26.9213641Z #22 591.7 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:35:26.9217937Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9225852Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9230759Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9231869Z #22 591.7 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.9232823Z #22 591.7 ptxas info : Compile time = 941.203 ms 2025-09-07T06:35:26.9237509Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9246010Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9250734Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9251879Z #22 591.7 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.9252738Z #22 591.7 ptxas info : Compile time = 1123.213 ms 2025-09-07T06:35:26.9256955Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9265948Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9270574Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9271650Z #22 591.7 ptxas info : Used 253 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.9272634Z #22 591.7 ptxas info : Compile time = 2135.828 ms 2025-09-07T06:35:26.9277730Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9286944Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9292453Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9293597Z #22 591.7 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.9294527Z #22 591.7 ptxas info : Compile time = 1964.213 ms 2025-09-07T06:35:26.9299563Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9308217Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9313254Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9314409Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.9315408Z #22 591.7 ptxas info : Compile time = 2788.961 ms 2025-09-07T06:35:26.9320400Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9329856Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9334608Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9335614Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.9336477Z #22 591.7 ptxas info : Compile time = 4371.504 ms 2025-09-07T06:35:26.9341185Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9349932Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9354285Z #22 591.7 120 bytes stack frame, 148 bytes spill stores, 256 bytes spill loads 2025-09-07T06:35:26.9355514Z #22 591.7 ptxas info : Used 255 registers, used 6 barriers, 120 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:35:26.9356877Z #22 591.7 ptxas info : Compile time = 2559.417 ms 2025-09-07T06:35:26.9361237Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9368756Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9373160Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9374111Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:35:26.9374934Z #22 591.7 ptxas info : Compile time = 2317.336 ms 2025-09-07T06:35:26.9379051Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9386680Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9391263Z #22 591.7 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:35:26.9394679Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.9395961Z #22 591.7 ptxas info : Compile time = 4203.853 ms 2025-09-07T06:35:26.9400262Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9408008Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9412478Z #22 591.7 120 bytes stack frame, 148 bytes spill stores, 160 bytes spill loads 2025-09-07T06:35:26.9413769Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 120 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.9414832Z #22 591.7 ptxas info : Compile time = 1260.082 ms 2025-09-07T06:35:26.9419079Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9427849Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9432814Z #22 591.7 40 bytes stack frame, 52 bytes spill stores, 68 bytes spill loads 2025-09-07T06:35:26.9434121Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.9435292Z #22 591.7 ptxas info : Compile time = 1523.830 ms 2025-09-07T06:35:26.9439855Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9449088Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9453837Z #22 591.7 88 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads 2025-09-07T06:35:26.9455324Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.9456462Z #22 591.7 ptxas info : Compile time = 3031.601 ms 2025-09-07T06:35:26.9460908Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9469673Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9474816Z #22 591.7 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:35:26.9476107Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.9477264Z #22 591.7 ptxas info : Compile time = 2549.133 ms 2025-09-07T06:35:26.9481908Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9490451Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9495206Z #22 591.7 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:35:26.9496520Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.9497633Z #22 591.7 ptxas info : Compile time = 2883.557 ms 2025-09-07T06:35:26.9502299Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9511299Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9516226Z #22 591.7 224 bytes stack frame, 164 bytes spill stores, 628 bytes spill loads 2025-09-07T06:35:26.9517516Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 224 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.9518612Z #22 591.7 ptxas info : Compile time = 6039.244 ms 2025-09-07T06:35:26.9525518Z #22 591.7 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi48EEES4_EEELi128EN7cutlass10bfloat16_tEfNS7_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EE3mmaINS_16FlashAttnFwdSm80ISB_NS_21CollectiveEpilogueFwdINS2_IJS4_S4_S5_EEENS2_IJNS3_ILi1EEESG_SG_EEES8_SA_Li128ELb1ELb1ELb0ELb0EEENS_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESR_EEESR_NS3_ILi16EEEEEENS2_IJNS2_IJSG_SR_EEENS3_ILi4EEENS3_ILi8EEEEEEEEEENS_7SoftmaxILi4ELi0EEEEEbRKNSB_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:35:26.9532959Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:35:26.9538420Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9547135Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9552997Z #22 591.7 136 bytes stack frame, 172 bytes spill stores, 312 bytes spill loads 2025-09-07T06:35:26.9554441Z #22 591.7 ptxas info : Used 255 registers, used 6 barriers, 136 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:35:26.9555688Z #22 591.7 ptxas info : Compile time = 2841.474 ms 2025-09-07T06:35:26.9560766Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9569643Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9574289Z #22 591.7 112 bytes stack frame, 176 bytes spill stores, 212 bytes spill loads 2025-09-07T06:35:26.9575450Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.9576511Z #22 591.7 ptxas info : Compile time = 2481.952 ms 2025-09-07T06:35:26.9581083Z #22 591.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:35:26.9589510Z #22 591.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:35:26.9595157Z #22 591.7 216 bytes stack frame, 176 bytes spill stores, 396 bytes spill loads 2025-09-07T06:35:26.9596933Z #22 591.7 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:35:26.9598390Z #22 591.7 ptxas info : Compile time = 6090.392 ms 2025-09-07T06:35:26.9606182Z #22 591.7 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi64EEES4_EEELi128EN7cutlass10bfloat16_tEfNS7_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EE3mmaINS_16FlashAttnFwdSm80ISB_NS_21CollectiveEpilogueFwdINS2_IJS4_S4_S5_EEENS2_IJNS3_ILi1EEESG_SG_EEES8_SA_Li128ELb1ELb1ELb0ELb0EEENS_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESR_EEESR_NS3_ILi16EEEEEENS2_IJNS2_IJSG_SR_EEENS3_ILi4EEENS3_ILi8EEEEEEEEEENS_7SoftmaxILi4ELi0EEEEEbRKNSB_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:35:26.9612570Z #22 591.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.1901302Z #22 627.0 [46/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:36:02.1915519Z #22 627.0 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:36:02.1919520Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.1926159Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.1929785Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.1930582Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:02.1931462Z #22 627.0 ptxas info : Compile time = 1.592 ms 2025-09-07T06:36:02.1935418Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.1942603Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.1946779Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.1947574Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:02.1948241Z #22 627.0 ptxas info : Compile time = 0.937 ms 2025-09-07T06:36:02.1952409Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.1959639Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.1963575Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.1964352Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:02.1965034Z #22 627.0 ptxas info : Compile time = 0.861 ms 2025-09-07T06:36:02.1969190Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.1976305Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.1980117Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.1980912Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:36:02.1981588Z #22 627.0 ptxas info : Compile time = 0.650 ms 2025-09-07T06:36:02.1989633Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.1996936Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2001214Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.2002012Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:02.2002691Z #22 627.0 ptxas info : Compile time = 0.563 ms 2025-09-07T06:36:02.2006646Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.2014053Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2017967Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.2018764Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:02.2019451Z #22 627.0 ptxas info : Compile time = 0.539 ms 2025-09-07T06:36:02.2023369Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.2030344Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2034156Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.2034936Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:36:02.2035636Z #22 627.0 ptxas info : Compile time = 0.563 ms 2025-09-07T06:36:02.2039729Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.2046944Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2051398Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.2052191Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:02.2052876Z #22 627.0 ptxas info : Compile time = 0.542 ms 2025-09-07T06:36:02.2056776Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.2064229Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2068161Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.2068949Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:02.2069646Z #22 627.0 ptxas info : Compile time = 0.558 ms 2025-09-07T06:36:02.2070165Z #22 627.0 ptxas info : 11 bytes gmem 2025-09-07T06:36:02.2073930Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.2080551Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2084161Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.2084861Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:02.2085466Z #22 627.0 ptxas info : Compile time = 564.517 ms 2025-09-07T06:36:02.2089611Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.2096976Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2101128Z #22 627.0 8 bytes stack frame, 36 bytes spill stores, 24 bytes spill loads 2025-09-07T06:36:02.2101990Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:36:02.2102743Z #22 627.0 ptxas info : Compile time = 684.730 ms 2025-09-07T06:36:02.2106667Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.2113889Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2117854Z #22 627.0 32 bytes stack frame, 64 bytes spill stores, 96 bytes spill loads 2025-09-07T06:36:02.2118747Z #22 627.0 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:36:02.2119517Z #22 627.0 ptxas info : Compile time = 1781.902 ms 2025-09-07T06:36:02.2125207Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.2132377Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2136159Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.2136881Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:02.2137488Z #22 627.0 ptxas info : Compile time = 1132.036 ms 2025-09-07T06:36:02.2141555Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.2148723Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2153223Z #22 627.0 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:36:02.2154103Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:36:02.2154858Z #22 627.0 ptxas info : Compile time = 1302.074 ms 2025-09-07T06:36:02.2158788Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.2166019Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2169971Z #22 627.0 48 bytes stack frame, 64 bytes spill stores, 112 bytes spill loads 2025-09-07T06:36:02.2170858Z #22 627.0 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:36:02.2171799Z #22 627.0 ptxas info : Compile time = 2785.194 ms 2025-09-07T06:36:02.2175764Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.2182706Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2186505Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.2187240Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:02.2187827Z #22 627.0 ptxas info : Compile time = 853.009 ms 2025-09-07T06:36:02.2191944Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.2199156Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2203224Z #22 627.0 8 bytes stack frame, 36 bytes spill stores, 24 bytes spill loads 2025-09-07T06:36:02.2204095Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:36:02.2204856Z #22 627.0 ptxas info : Compile time = 954.229 ms 2025-09-07T06:36:02.2208757Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.2216106Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.2220053Z #22 627.0 40 bytes stack frame, 68 bytes spill stores, 108 bytes spill loads 2025-09-07T06:36:02.2220939Z #22 627.0 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:36:02.2221708Z #22 627.0 ptxas info : Compile time = 2225.880 ms 2025-09-07T06:36:02.3485232Z #22 627.0 [47/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:36:02.3499198Z #22 627.0 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:36:02.3503159Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.3509696Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3513290Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3514104Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:36:02.3514788Z #22 627.0 ptxas info : Compile time = 1.751 ms 2025-09-07T06:36:02.3518440Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.3525081Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3528899Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3529708Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:36:02.3530380Z #22 627.0 ptxas info : Compile time = 0.853 ms 2025-09-07T06:36:02.3534481Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.3541621Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3545638Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3546438Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:36:02.3547119Z #22 627.0 ptxas info : Compile time = 0.881 ms 2025-09-07T06:36:02.3551273Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.3558670Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3562544Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3563348Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:36:02.3564011Z #22 627.0 ptxas info : Compile time = 0.592 ms 2025-09-07T06:36:02.3567890Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.3575001Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3579012Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3579809Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:36:02.3580486Z #22 627.0 ptxas info : Compile time = 0.561 ms 2025-09-07T06:36:02.3584426Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.3591670Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3595790Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3596590Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:36:02.3597253Z #22 627.0 ptxas info : Compile time = 0.550 ms 2025-09-07T06:36:02.3601205Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.3608613Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3612697Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3613492Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:36:02.3614175Z #22 627.0 ptxas info : Compile time = 0.588 ms 2025-09-07T06:36:02.3617926Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.3624819Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3628774Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3629566Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:36:02.3630230Z #22 627.0 ptxas info : Compile time = 0.551 ms 2025-09-07T06:36:02.3634475Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.3643611Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3649269Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3650363Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:36:02.3651293Z #22 627.0 ptxas info : Compile time = 0.532 ms 2025-09-07T06:36:02.3656177Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:02.3665931Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3670951Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3671975Z #22 627.0 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:36:02.3672831Z #22 627.0 ptxas info : Compile time = 0.552 ms 2025-09-07T06:36:02.3673495Z #22 627.0 ptxas info : 11 bytes gmem 2025-09-07T06:36:02.3678281Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.3686700Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3692095Z #22 627.0 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:36:02.3693327Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:36:02.3694353Z #22 627.0 ptxas info : Compile time = 642.573 ms 2025-09-07T06:36:02.3699101Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.3708188Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3713435Z #22 627.0 24 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:36:02.3714611Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:36:02.3715639Z #22 627.0 ptxas info : Compile time = 637.127 ms 2025-09-07T06:36:02.3720988Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.3731075Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3736201Z #22 627.0 40 bytes stack frame, 76 bytes spill stores, 84 bytes spill loads 2025-09-07T06:36:02.3737364Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers, 40 bytes cumulative stack size 2025-09-07T06:36:02.3738348Z #22 627.0 ptxas info : Compile time = 812.353 ms 2025-09-07T06:36:02.3743442Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.3854543Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3860391Z #22 627.0 56 bytes stack frame, 280 bytes spill stores, 304 bytes spill loads 2025-09-07T06:36:02.3861673Z #22 627.0 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:36:02.3862755Z #22 627.0 ptxas info : Compile time = 1679.535 ms 2025-09-07T06:36:02.3868034Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.3877134Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3882848Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3883884Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:02.3884724Z #22 627.0 ptxas info : Compile time = 1095.472 ms 2025-09-07T06:36:02.3890417Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.3900613Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3906027Z #22 627.0 32 bytes stack frame, 148 bytes spill stores, 168 bytes spill loads 2025-09-07T06:36:02.3907208Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:36:02.3908251Z #22 627.0 ptxas info : Compile time = 1321.138 ms 2025-09-07T06:36:02.3913533Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.3923105Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3928376Z #22 627.0 40 bytes stack frame, 216 bytes spill stores, 280 bytes spill loads 2025-09-07T06:36:02.3929565Z #22 627.0 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:36:02.3930538Z #22 627.0 ptxas info : Compile time = 2280.007 ms 2025-09-07T06:36:02.3935809Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.3945621Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3951116Z #22 627.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:02.3952101Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:02.3952919Z #22 627.0 ptxas info : Compile time = 881.778 ms 2025-09-07T06:36:02.3958482Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.3968192Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3973432Z #22 627.0 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:36:02.3974607Z #22 627.0 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:36:02.3975678Z #22 627.0 ptxas info : Compile time = 1066.792 ms 2025-09-07T06:36:02.3981173Z #22 627.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:02.3991023Z #22 627.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:02.3996479Z #22 627.0 48 bytes stack frame, 252 bytes spill stores, 268 bytes spill loads 2025-09-07T06:36:02.3997627Z #22 627.0 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:36:02.3998629Z #22 627.0 ptxas info : Compile time = 2135.285 ms 2025-09-07T06:36:05.7131639Z #22 630.5 [48/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:36:05.7151200Z #22 630.5 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:36:05.7154830Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:05.7162465Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7167794Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7168895Z #22 630.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:05.7169742Z #22 630.5 ptxas info : Compile time = 1.291 ms 2025-09-07T06:36:05.7175866Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:05.7186213Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7191849Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7192909Z #22 630.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:05.7193664Z #22 630.5 ptxas info : Compile time = 0.712 ms 2025-09-07T06:36:05.7197799Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:05.7206206Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7212337Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7213418Z #22 630.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:05.7214365Z #22 630.5 ptxas info : Compile time = 0.577 ms 2025-09-07T06:36:05.7219589Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:05.7229281Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7233857Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7234637Z #22 630.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:05.7235291Z #22 630.5 ptxas info : Compile time = 0.381 ms 2025-09-07T06:36:05.7239532Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:05.7259020Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7264830Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7265880Z #22 630.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:05.7266843Z #22 630.5 ptxas info : Compile time = 0.374 ms 2025-09-07T06:36:05.7272757Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:05.7281148Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7285573Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7286504Z #22 630.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:05.7287493Z #22 630.5 ptxas info : Compile time = 0.391 ms 2025-09-07T06:36:05.7292638Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:05.7302202Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7307392Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7308474Z #22 630.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:05.7309399Z #22 630.5 ptxas info : Compile time = 0.400 ms 2025-09-07T06:36:05.7315175Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:05.7322622Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7327586Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7328667Z #22 630.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:05.7329602Z #22 630.5 ptxas info : Compile time = 0.361 ms 2025-09-07T06:36:05.7335460Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:05.7345770Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7352050Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7353111Z #22 630.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:05.7354048Z #22 630.5 ptxas info : Compile time = 0.360 ms 2025-09-07T06:36:05.7354671Z #22 630.5 ptxas info : 11 bytes gmem 2025-09-07T06:36:05.7359373Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:05.7366051Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7370780Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7371883Z #22 630.5 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:05.7372687Z #22 630.5 ptxas info : Compile time = 650.728 ms 2025-09-07T06:36:05.7378401Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:05.7388726Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7394359Z #22 630.5 56 bytes stack frame, 168 bytes spill stores, 184 bytes spill loads 2025-09-07T06:36:05.7395580Z #22 630.5 ptxas info : Used 168 registers, used 9 barriers, 56 bytes cumulative stack size 2025-09-07T06:36:05.7396648Z #22 630.5 ptxas info : Compile time = 894.878 ms 2025-09-07T06:36:05.7400957Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:05.7409016Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7415159Z #22 630.5 80 bytes stack frame, 176 bytes spill stores, 212 bytes spill loads 2025-09-07T06:36:05.7416308Z #22 630.5 ptxas info : Used 168 registers, used 16 barriers, 80 bytes cumulative stack size 2025-09-07T06:36:05.7417349Z #22 630.5 ptxas info : Compile time = 1851.980 ms 2025-09-07T06:36:05.7422483Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:05.7432197Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7437491Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7438481Z #22 630.5 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:05.7439271Z #22 630.5 ptxas info : Compile time = 1015.002 ms 2025-09-07T06:36:05.7444062Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:05.7452671Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7458280Z #22 630.5 56 bytes stack frame, 176 bytes spill stores, 208 bytes spill loads 2025-09-07T06:36:05.7459509Z #22 630.5 ptxas info : Used 168 registers, used 9 barriers, 56 bytes cumulative stack size 2025-09-07T06:36:05.7460594Z #22 630.5 ptxas info : Compile time = 1586.683 ms 2025-09-07T06:36:05.7466645Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:05.7477108Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7482535Z #22 630.5 72 bytes stack frame, 376 bytes spill stores, 424 bytes spill loads 2025-09-07T06:36:05.7483424Z #22 630.5 ptxas info : Used 168 registers, used 16 barriers, 72 bytes cumulative stack size 2025-09-07T06:36:05.7484180Z #22 630.5 ptxas info : Compile time = 2861.389 ms 2025-09-07T06:36:05.7487901Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:05.7496889Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7502046Z #22 630.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:05.7503015Z #22 630.5 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:05.7503829Z #22 630.5 ptxas info : Compile time = 883.186 ms 2025-09-07T06:36:05.7509747Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:05.7520047Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7524964Z #22 630.5 56 bytes stack frame, 172 bytes spill stores, 188 bytes spill loads 2025-09-07T06:36:05.7525843Z #22 630.5 ptxas info : Used 168 registers, used 9 barriers, 56 bytes cumulative stack size 2025-09-07T06:36:05.7526603Z #22 630.5 ptxas info : Compile time = 1286.520 ms 2025-09-07T06:36:05.7530755Z #22 630.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:05.7540535Z #22 630.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:05.7546517Z #22 630.5 72 bytes stack frame, 164 bytes spill stores, 204 bytes spill loads 2025-09-07T06:36:05.7547710Z #22 630.5 ptxas info : Used 168 registers, used 16 barriers, 72 bytes cumulative stack size 2025-09-07T06:36:05.8619811Z #22 630.5 ptxas info : Compile time = 2473.829 ms 2025-09-07T06:36:07.3008766Z #22 632.1 [49/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:36:07.3025000Z #22 632.1 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:36:07.3028654Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:07.3035664Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3039952Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3040918Z #22 632.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:07.3042212Z #22 632.1 ptxas info : Compile time = 1.779 ms 2025-09-07T06:36:07.3048019Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:07.3058986Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3064894Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3065998Z #22 632.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:07.3066929Z #22 632.1 ptxas info : Compile time = 0.982 ms 2025-09-07T06:36:07.3072621Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:07.3085336Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3091356Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3092407Z #22 632.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:07.3093134Z #22 632.1 ptxas info : Compile time = 0.849 ms 2025-09-07T06:36:07.3096672Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:07.3103489Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3107165Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3107969Z #22 632.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:07.3108804Z #22 632.1 ptxas info : Compile time = 0.575 ms 2025-09-07T06:36:07.3112810Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:07.3122665Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3128135Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3129238Z #22 632.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:07.3130176Z #22 632.1 ptxas info : Compile time = 20.960 ms 2025-09-07T06:36:07.3136080Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:07.3146847Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3153312Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3154421Z #22 632.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:07.3155337Z #22 632.1 ptxas info : Compile time = 0.732 ms 2025-09-07T06:36:07.3160609Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:07.3168195Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3171947Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3172723Z #22 632.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:07.3173540Z #22 632.1 ptxas info : Compile time = 0.671 ms 2025-09-07T06:36:07.3177540Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:07.3184747Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3190360Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3191442Z #22 632.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:07.3192348Z #22 632.1 ptxas info : Compile time = 0.623 ms 2025-09-07T06:36:07.3197910Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:07.3208622Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3214584Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3215648Z #22 632.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:07.3216607Z #22 632.1 ptxas info : Compile time = 0.590 ms 2025-09-07T06:36:07.3217259Z #22 632.1 ptxas info : 11 bytes gmem 2025-09-07T06:36:07.3222493Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:07.3232489Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3237539Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3238250Z #22 632.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:07.3238988Z #22 632.1 ptxas info : Compile time = 511.811 ms 2025-09-07T06:36:07.3242890Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:07.3250451Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3254464Z #22 632.1 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:36:07.3255353Z #22 632.1 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:36:07.3256096Z #22 632.1 ptxas info : Compile time = 759.718 ms 2025-09-07T06:36:07.3260513Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:07.3271010Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3276566Z #22 632.1 40 bytes stack frame, 88 bytes spill stores, 108 bytes spill loads 2025-09-07T06:36:07.3277761Z #22 632.1 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:36:07.3278838Z #22 632.1 ptxas info : Compile time = 1980.068 ms 2025-09-07T06:36:07.3284150Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:07.3294209Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3299525Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3300520Z #22 632.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:07.3301556Z #22 632.1 ptxas info : Compile time = 1021.993 ms 2025-09-07T06:36:07.3307305Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:07.3314480Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3318336Z #22 632.1 16 bytes stack frame, 40 bytes spill stores, 32 bytes spill loads 2025-09-07T06:36:07.3319205Z #22 632.1 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:36:07.3320038Z #22 632.1 ptxas info : Compile time = 1389.713 ms 2025-09-07T06:36:07.3323944Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:07.3332947Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3338639Z #22 632.1 64 bytes stack frame, 92 bytes spill stores, 120 bytes spill loads 2025-09-07T06:36:07.3339849Z #22 632.1 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:36:07.3340845Z #22 632.1 ptxas info : Compile time = 2961.959 ms 2025-09-07T06:36:07.3346024Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:07.3356314Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3361574Z #22 632.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:07.3362565Z #22 632.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:07.3363582Z #22 632.1 ptxas info : Compile time = 710.188 ms 2025-09-07T06:36:07.3369486Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:07.3380027Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3384290Z #22 632.1 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:36:07.3385159Z #22 632.1 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:36:07.3385897Z #22 632.1 ptxas info : Compile time = 965.364 ms 2025-09-07T06:36:07.3389748Z #22 632.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:07.3397126Z #22 632.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:07.3401081Z #22 632.1 40 bytes stack frame, 96 bytes spill stores, 116 bytes spill loads 2025-09-07T06:36:07.3401974Z #22 632.1 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:36:07.3402766Z #22 632.1 ptxas info : Compile time = 2307.801 ms 2025-09-07T06:36:09.5829309Z #22 634.4 [50/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:36:09.7382613Z #22 634.4 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:36:09.7387736Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:09.7396290Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7401284Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7402304Z #22 634.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:09.7403141Z #22 634.4 ptxas info : Compile time = 1.729 ms 2025-09-07T06:36:09.7408490Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:09.7418288Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7423881Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7424839Z #22 634.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:09.7425650Z #22 634.4 ptxas info : Compile time = 0.891 ms 2025-09-07T06:36:09.7430970Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:09.7439969Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7444880Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7445920Z #22 634.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:09.7446772Z #22 634.4 ptxas info : Compile time = 0.819 ms 2025-09-07T06:36:09.7451778Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:09.7460800Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7465987Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7467100Z #22 634.4 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:36:09.7468086Z #22 634.4 ptxas info : Compile time = 0.579 ms 2025-09-07T06:36:09.7473582Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:09.7483556Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7488781Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7489841Z #22 634.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:09.7490743Z #22 634.4 ptxas info : Compile time = 0.555 ms 2025-09-07T06:36:09.7496812Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:09.7506738Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7512437Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7513488Z #22 634.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:09.7514296Z #22 634.4 ptxas info : Compile time = 0.540 ms 2025-09-07T06:36:09.7519087Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:09.7528258Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7533825Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7534992Z #22 634.4 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:36:09.7535927Z #22 634.4 ptxas info : Compile time = 0.564 ms 2025-09-07T06:36:09.7541613Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:09.7551636Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7557560Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7558642Z #22 634.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:09.7559560Z #22 634.4 ptxas info : Compile time = 0.534 ms 2025-09-07T06:36:09.7565626Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:09.7577099Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7583142Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7584294Z #22 634.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:09.7585276Z #22 634.4 ptxas info : Compile time = 0.540 ms 2025-09-07T06:36:09.7586004Z #22 634.4 ptxas info : 11 bytes gmem 2025-09-07T06:36:09.7591463Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:09.7601170Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7606749Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7607831Z #22 634.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:09.7608687Z #22 634.4 ptxas info : Compile time = 709.219 ms 2025-09-07T06:36:09.7615163Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:09.7626228Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7632163Z #22 634.4 8 bytes stack frame, 36 bytes spill stores, 24 bytes spill loads 2025-09-07T06:36:09.7633436Z #22 634.4 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:36:09.7634560Z #22 634.4 ptxas info : Compile time = 875.712 ms 2025-09-07T06:36:09.7640734Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:09.7651951Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7657863Z #22 634.4 40 bytes stack frame, 92 bytes spill stores, 136 bytes spill loads 2025-09-07T06:36:09.7659177Z #22 634.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:36:09.7660302Z #22 634.4 ptxas info : Compile time = 2036.204 ms 2025-09-07T06:36:09.7666105Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:09.7676846Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7682531Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7683573Z #22 634.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:09.7684696Z #22 634.4 ptxas info : Compile time = 1664.582 ms 2025-09-07T06:36:09.7690728Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:09.7701947Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7707944Z #22 634.4 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:36:09.7709233Z #22 634.4 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:36:09.7710549Z #22 634.4 ptxas info : Compile time = 1668.299 ms 2025-09-07T06:36:09.7716470Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:09.7727754Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7733880Z #22 634.4 56 bytes stack frame, 120 bytes spill stores, 152 bytes spill loads 2025-09-07T06:36:09.7735166Z #22 634.4 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:36:09.7736294Z #22 634.4 ptxas info : Compile time = 3184.143 ms 2025-09-07T06:36:09.7741908Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:09.7751959Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7757217Z #22 634.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:09.7758406Z #22 634.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:09.7759014Z #22 634.4 ptxas info : Compile time = 1254.914 ms 2025-09-07T06:36:09.7763028Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:09.7770372Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7774540Z #22 634.4 8 bytes stack frame, 36 bytes spill stores, 24 bytes spill loads 2025-09-07T06:36:09.7775391Z #22 634.4 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:36:09.7776356Z #22 634.4 ptxas info : Compile time = 1237.980 ms 2025-09-07T06:36:09.7780328Z #22 634.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:09.7787818Z #22 634.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:09.7791798Z #22 634.4 40 bytes stack frame, 92 bytes spill stores, 132 bytes spill loads 2025-09-07T06:36:09.7792684Z #22 634.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:36:09.7793460Z #22 634.4 ptxas info : Compile time = 2493.770 ms 2025-09-07T06:36:18.9498722Z #22 643.7 [51/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:36:19.1011095Z #22 643.7 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:36:19.1015204Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:19.1021981Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1025960Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1026748Z #22 643.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:19.1028005Z #22 643.7 ptxas info : Compile time = 1.644 ms 2025-09-07T06:36:19.1031972Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:19.1039270Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1043267Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1044068Z #22 643.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:19.1044768Z #22 643.7 ptxas info : Compile time = 0.875 ms 2025-09-07T06:36:19.1049334Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:19.1056751Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1060717Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1061506Z #22 643.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:19.1062193Z #22 643.7 ptxas info : Compile time = 0.752 ms 2025-09-07T06:36:19.1066110Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:19.1072853Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1076734Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1077520Z #22 643.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:19.1078201Z #22 643.7 ptxas info : Compile time = 0.554 ms 2025-09-07T06:36:19.1082183Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:19.1089471Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1093686Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1094460Z #22 643.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:19.1095146Z #22 643.7 ptxas info : Compile time = 0.488 ms 2025-09-07T06:36:19.1099294Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:19.1106567Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1110492Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1111287Z #22 643.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:19.1111965Z #22 643.7 ptxas info : Compile time = 0.461 ms 2025-09-07T06:36:19.1115788Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:19.1122490Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1126422Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1127193Z #22 643.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:19.1127877Z #22 643.7 ptxas info : Compile time = 0.456 ms 2025-09-07T06:36:19.1131995Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:19.1139244Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1143179Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1143967Z #22 643.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:19.1144642Z #22 643.7 ptxas info : Compile time = 0.476 ms 2025-09-07T06:36:19.1148709Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:19.1156290Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1160819Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1161771Z #22 643.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:19.1162638Z #22 643.7 ptxas info : Compile time = 0.451 ms 2025-09-07T06:36:19.1163284Z #22 643.7 ptxas info : 11 bytes gmem 2025-09-07T06:36:19.1168325Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:19.1177263Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1181962Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1182796Z #22 643.7 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:19.1183460Z #22 643.7 ptxas info : Compile time = 717.908 ms 2025-09-07T06:36:19.1188040Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:19.1196904Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1201567Z #22 643.7 24 bytes stack frame, 52 bytes spill stores, 40 bytes spill loads 2025-09-07T06:36:19.1202696Z #22 643.7 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:36:19.1203693Z #22 643.7 ptxas info : Compile time = 1057.050 ms 2025-09-07T06:36:19.1209059Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:19.1218533Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1223881Z #22 643.7 120 bytes stack frame, 208 bytes spill stores, 304 bytes spill loads 2025-09-07T06:36:19.1224916Z #22 643.7 ptxas info : Used 168 registers, used 16 barriers, 120 bytes cumulative stack size 2025-09-07T06:36:19.1225930Z #22 643.7 ptxas info : Compile time = 2287.799 ms 2025-09-07T06:36:19.1230957Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:19.1239906Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1245223Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1246131Z #22 643.7 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:19.1246892Z #22 643.7 ptxas info : Compile time = 1491.803 ms 2025-09-07T06:36:19.1252488Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:19.1261970Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1267416Z #22 643.7 24 bytes stack frame, 48 bytes spill stores, 36 bytes spill loads 2025-09-07T06:36:19.1268585Z #22 643.7 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:36:19.1269577Z #22 643.7 ptxas info : Compile time = 1937.056 ms 2025-09-07T06:36:19.1275292Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:19.1285639Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1291143Z #22 643.7 104 bytes stack frame, 152 bytes spill stores, 208 bytes spill loads 2025-09-07T06:36:19.1292430Z #22 643.7 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:36:19.1293552Z #22 643.7 ptxas info : Compile time = 3444.505 ms 2025-09-07T06:36:19.1298593Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:19.1307714Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1313174Z #22 643.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:19.1314088Z #22 643.7 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:19.1314828Z #22 643.7 ptxas info : Compile time = 1023.160 ms 2025-09-07T06:36:19.1320268Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:19.1330076Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1335697Z #22 643.7 24 bytes stack frame, 48 bytes spill stores, 36 bytes spill loads 2025-09-07T06:36:19.1336882Z #22 643.7 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:36:19.1337872Z #22 643.7 ptxas info : Compile time = 1402.567 ms 2025-09-07T06:36:19.1342976Z #22 643.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:19.1352994Z #22 643.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:19.1357453Z #22 643.7 104 bytes stack frame, 156 bytes spill stores, 204 bytes spill loads 2025-09-07T06:36:19.1358383Z #22 643.7 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:36:19.1359154Z #22 643.7 ptxas info : Compile time = 2709.192 ms 2025-09-07T06:36:45.6330252Z #22 670.4 [52/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_packgqa_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_packgqa_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_packgqa_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:36:45.7901948Z #22 670.4 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:36:45.7907766Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:45.7917735Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.7923422Z #22 670.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:45.7924680Z #22 670.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:45.7925806Z #22 670.4 ptxas info : Compile time = 1.752 ms 2025-09-07T06:36:45.7931488Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:45.7941771Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.7947241Z #22 670.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:45.7948499Z #22 670.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:45.7950256Z #22 670.4 ptxas info : Compile time = 0.956 ms 2025-09-07T06:36:45.7955088Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:45.7963350Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.7967813Z #22 670.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:45.7969210Z #22 670.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:45.7970231Z #22 670.4 ptxas info : Compile time = 0.889 ms 2025-09-07T06:36:45.7976193Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:45.7987399Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.7993411Z #22 670.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:45.7994603Z #22 670.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:45.7995587Z #22 670.4 ptxas info : Compile time = 0.646 ms 2025-09-07T06:36:45.8001407Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:45.8012248Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8018052Z #22 670.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:45.8019173Z #22 670.4 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:36:45.8020090Z #22 670.4 ptxas info : Compile time = 0.614 ms 2025-09-07T06:36:45.8026049Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:45.8037078Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8043033Z #22 670.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:45.8044199Z #22 670.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:45.8045205Z #22 670.4 ptxas info : Compile time = 0.688 ms 2025-09-07T06:36:45.8051523Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:45.8062405Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8068456Z #22 670.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:45.8069591Z #22 670.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:45.8070589Z #22 670.4 ptxas info : Compile time = 0.636 ms 2025-09-07T06:36:45.8076328Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:45.8086757Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8094366Z #22 670.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:45.8095575Z #22 670.4 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:36:45.8096565Z #22 670.4 ptxas info : Compile time = 0.491 ms 2025-09-07T06:36:45.8102479Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:45.8112873Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8118596Z #22 670.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:45.8119763Z #22 670.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:45.8120539Z #22 670.4 ptxas info : Compile time = 0.482 ms 2025-09-07T06:36:45.8126241Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:45.8137265Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8142964Z #22 670.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:45.8144107Z #22 670.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:45.8145093Z #22 670.4 ptxas info : Compile time = 0.468 ms 2025-09-07T06:36:45.8145837Z #22 670.4 ptxas info : 11 bytes gmem 2025-09-07T06:36:45.8252137Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:45.8262069Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8267754Z #22 670.4 24 bytes stack frame, 52 bytes spill stores, 56 bytes spill loads 2025-09-07T06:36:45.8269046Z #22 670.4 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:36:45.8270154Z #22 670.4 ptxas info : Compile time = 631.715 ms 2025-09-07T06:36:45.8275828Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:45.8285462Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8290851Z #22 670.4 32 bytes stack frame, 100 bytes spill stores, 104 bytes spill loads 2025-09-07T06:36:45.8292316Z #22 670.4 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:36:45.8293439Z #22 670.4 ptxas info : Compile time = 639.524 ms 2025-09-07T06:36:45.8299247Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:45.8310067Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8315920Z #22 670.4 48 bytes stack frame, 120 bytes spill stores, 128 bytes spill loads 2025-09-07T06:36:45.8317219Z #22 670.4 ptxas info : Used 168 registers, used 9 barriers, 48 bytes cumulative stack size 2025-09-07T06:36:45.8318327Z #22 670.4 ptxas info : Compile time = 809.273 ms 2025-09-07T06:36:45.8324170Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:45.8334864Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8340485Z #22 670.4 56 bytes stack frame, 272 bytes spill stores, 300 bytes spill loads 2025-09-07T06:36:45.8341771Z #22 670.4 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:36:45.8342903Z #22 670.4 ptxas info : Compile time = 1813.243 ms 2025-09-07T06:36:45.8349056Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:45.8359609Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8365087Z #22 670.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:45.8366150Z #22 670.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:45.8367025Z #22 670.4 ptxas info : Compile time = 1181.032 ms 2025-09-07T06:36:45.8372649Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:45.8382775Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8388452Z #22 670.4 48 bytes stack frame, 144 bytes spill stores, 160 bytes spill loads 2025-09-07T06:36:45.8389883Z #22 670.4 ptxas info : Used 168 registers, used 9 barriers, 48 bytes cumulative stack size 2025-09-07T06:36:45.8390985Z #22 670.4 ptxas info : Compile time = 1327.813 ms 2025-09-07T06:36:45.8397234Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:45.8408831Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8415875Z #22 670.4 56 bytes stack frame, 160 bytes spill stores, 204 bytes spill loads 2025-09-07T06:36:45.8417403Z #22 670.4 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:36:45.8418617Z #22 670.4 ptxas info : Compile time = 2609.425 ms 2025-09-07T06:36:45.8424726Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:45.8435833Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8441965Z #22 670.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:36:45.8443347Z #22 670.4 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:36:45.8444384Z #22 670.4 ptxas info : Compile time = 1003.255 ms 2025-09-07T06:36:45.8450647Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:45.8461358Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8466971Z #22 670.4 32 bytes stack frame, 88 bytes spill stores, 104 bytes spill loads 2025-09-07T06:36:45.8468243Z #22 670.4 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:36:45.8469361Z #22 670.4 ptxas info : Compile time = 1147.779 ms 2025-09-07T06:36:45.8475110Z #22 670.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:45.8485850Z #22 670.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:45.8491817Z #22 670.4 40 bytes stack frame, 264 bytes spill stores, 296 bytes spill loads 2025-09-07T06:36:45.8493133Z #22 670.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:36:45.8494257Z #22 670.4 ptxas info : Compile time = 2296.269 ms 2025-09-07T06:36:58.6694513Z #22 683.4 [53/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:36:58.6712775Z #22 683.4 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:36:58.6717701Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:58.6726057Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.6730542Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.6731704Z #22 683.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:58.6732555Z #22 683.4 ptxas info : Compile time = 1.967 ms 2025-09-07T06:36:58.6737691Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:58.6746888Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.6752395Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.6753385Z #22 683.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:58.6754248Z #22 683.4 ptxas info : Compile time = 1.293 ms 2025-09-07T06:36:58.6759316Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:58.6768630Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.6773875Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.6774927Z #22 683.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:58.6775840Z #22 683.4 ptxas info : Compile time = 1.200 ms 2025-09-07T06:36:58.6780870Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:58.8189696Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8195044Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.8196713Z #22 683.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:58.8197856Z #22 683.4 ptxas info : Compile time = 0.587 ms 2025-09-07T06:36:58.8203799Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:58.8214611Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8221056Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.8222324Z #22 683.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:58.8223388Z #22 683.4 ptxas info : Compile time = 0.577 ms 2025-09-07T06:36:58.8228991Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:58.8239443Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8245341Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.8246599Z #22 683.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:58.8247635Z #22 683.4 ptxas info : Compile time = 0.561 ms 2025-09-07T06:36:58.8253253Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:58.8263102Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8268548Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.8269791Z #22 683.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:58.8270895Z #22 683.4 ptxas info : Compile time = 0.617 ms 2025-09-07T06:36:58.8276634Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:58.8286928Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8293290Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.8294569Z #22 683.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:58.8295708Z #22 683.4 ptxas info : Compile time = 0.563 ms 2025-09-07T06:36:58.8301201Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:36:58.8311596Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8317428Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.8318739Z #22 683.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:36:58.8334879Z #22 683.4 ptxas info : Compile time = 0.568 ms 2025-09-07T06:36:58.8450632Z #22 683.4 ptxas info : 11 bytes gmem 2025-09-07T06:36:58.8455956Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:58.8464803Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8469677Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.8470635Z #22 683.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:58.8471431Z #22 683.4 ptxas info : Compile time = 610.787 ms 2025-09-07T06:36:58.8477042Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:58.8486717Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8492607Z #22 683.4 56 bytes stack frame, 168 bytes spill stores, 184 bytes spill loads 2025-09-07T06:36:58.8493788Z #22 683.4 ptxas info : Used 168 registers, used 9 barriers, 56 bytes cumulative stack size 2025-09-07T06:36:58.8494796Z #22 683.4 ptxas info : Compile time = 900.318 ms 2025-09-07T06:36:58.8500016Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:58.8509751Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8514939Z #22 683.4 80 bytes stack frame, 176 bytes spill stores, 212 bytes spill loads 2025-09-07T06:36:58.8516166Z #22 683.4 ptxas info : Used 168 registers, used 16 barriers, 80 bytes cumulative stack size 2025-09-07T06:36:58.8517169Z #22 683.4 ptxas info : Compile time = 1936.162 ms 2025-09-07T06:36:58.8522295Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:58.8531555Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8536559Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.8537496Z #22 683.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:58.8538313Z #22 683.4 ptxas info : Compile time = 1005.635 ms 2025-09-07T06:36:58.8543790Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:58.8553636Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8559220Z #22 683.4 56 bytes stack frame, 176 bytes spill stores, 208 bytes spill loads 2025-09-07T06:36:58.8560368Z #22 683.4 ptxas info : Used 168 registers, used 9 barriers, 56 bytes cumulative stack size 2025-09-07T06:36:58.8561363Z #22 683.4 ptxas info : Compile time = 1444.398 ms 2025-09-07T06:36:58.8566642Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:58.8576379Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8581681Z #22 683.4 72 bytes stack frame, 376 bytes spill stores, 424 bytes spill loads 2025-09-07T06:36:58.8582844Z #22 683.4 ptxas info : Used 168 registers, used 16 barriers, 72 bytes cumulative stack size 2025-09-07T06:36:58.8583870Z #22 683.4 ptxas info : Compile time = 2656.713 ms 2025-09-07T06:36:58.8588990Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:58.8598011Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8602765Z #22 683.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:36:58.8603689Z #22 683.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:36:58.8604479Z #22 683.4 ptxas info : Compile time = 856.434 ms 2025-09-07T06:36:58.8609994Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:58.8619825Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8625280Z #22 683.4 56 bytes stack frame, 172 bytes spill stores, 188 bytes spill loads 2025-09-07T06:36:58.8626398Z #22 683.4 ptxas info : Used 168 registers, used 9 barriers, 56 bytes cumulative stack size 2025-09-07T06:36:58.8627430Z #22 683.4 ptxas info : Compile time = 1191.199 ms 2025-09-07T06:36:58.8632544Z #22 683.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:36:58.8642061Z #22 683.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:36:58.8647419Z #22 683.4 72 bytes stack frame, 164 bytes spill stores, 204 bytes spill loads 2025-09-07T06:36:58.8648618Z #22 683.4 ptxas info : Used 168 registers, used 16 barriers, 72 bytes cumulative stack size 2025-09-07T06:36:58.8649862Z #22 683.4 ptxas info : Compile time = 2292.481 ms 2025-09-07T06:37:09.8368681Z #22 694.6 [54/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_packgqa_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_packgqa_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_packgqa_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:37:09.8387455Z #22 694.6 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:37:09.8392588Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:09.8402377Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8407482Z #22 694.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:09.8408607Z #22 694.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:09.8409483Z #22 694.6 ptxas info : Compile time = 1.742 ms 2025-09-07T06:37:09.8414838Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:09.8424237Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8429607Z #22 694.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:09.8430700Z #22 694.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:09.8431596Z #22 694.6 ptxas info : Compile time = 0.857 ms 2025-09-07T06:37:09.8437072Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:09.8447151Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8453325Z #22 694.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:09.8454405Z #22 694.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:09.8455306Z #22 694.6 ptxas info : Compile time = 0.782 ms 2025-09-07T06:37:09.8460591Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:09.8470914Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8476287Z #22 694.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:09.8477365Z #22 694.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:09.8478299Z #22 694.6 ptxas info : Compile time = 0.534 ms 2025-09-07T06:37:09.8483464Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:09.8493226Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8498804Z #22 694.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:09.8499862Z #22 694.6 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:37:09.8500770Z #22 694.6 ptxas info : Compile time = 20.841 ms 2025-09-07T06:37:09.8506185Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:09.8516216Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8521878Z #22 694.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:09.8522907Z #22 694.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:09.8523823Z #22 694.6 ptxas info : Compile time = 0.668 ms 2025-09-07T06:37:09.8529171Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:09.8541665Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8547143Z #22 694.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:09.8548232Z #22 694.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:09.8549419Z #22 694.6 ptxas info : Compile time = 0.566 ms 2025-09-07T06:37:09.8554629Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:09.8564652Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8570031Z #22 694.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:09.8571242Z #22 694.6 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:37:09.8572170Z #22 694.6 ptxas info : Compile time = 0.562 ms 2025-09-07T06:37:09.8577583Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:09.8587574Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8593357Z #22 694.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:09.8594452Z #22 694.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:09.8595309Z #22 694.6 ptxas info : Compile time = 0.536 ms 2025-09-07T06:37:09.8600938Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:09.8611237Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8616638Z #22 694.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:09.8617706Z #22 694.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:09.8618648Z #22 694.6 ptxas info : Compile time = 0.505 ms 2025-09-07T06:37:09.8619317Z #22 694.6 ptxas info : 11 bytes gmem 2025-09-07T06:37:09.8624201Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:09.8633526Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8638667Z #22 694.6 24 bytes stack frame, 52 bytes spill stores, 56 bytes spill loads 2025-09-07T06:37:09.8639820Z #22 694.6 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:37:09.8640896Z #22 694.6 ptxas info : Compile time = 751.457 ms 2025-09-07T06:37:09.8645939Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:09.8656177Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8661357Z #22 694.6 32 bytes stack frame, 108 bytes spill stores, 112 bytes spill loads 2025-09-07T06:37:09.8662561Z #22 694.6 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:37:09.8663549Z #22 694.6 ptxas info : Compile time = 764.144 ms 2025-09-07T06:37:09.8669359Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:09.8679151Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8684659Z #22 694.6 56 bytes stack frame, 156 bytes spill stores, 160 bytes spill loads 2025-09-07T06:37:09.8685866Z #22 694.6 ptxas info : Used 168 registers, used 9 barriers, 56 bytes cumulative stack size 2025-09-07T06:37:09.8686896Z #22 694.6 ptxas info : Compile time = 901.528 ms 2025-09-07T06:37:09.8692522Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:09.8702803Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8708299Z #22 694.6 72 bytes stack frame, 308 bytes spill stores, 328 bytes spill loads 2025-09-07T06:37:09.8709509Z #22 694.6 ptxas info : Used 168 registers, used 16 barriers, 72 bytes cumulative stack size 2025-09-07T06:37:09.8710549Z #22 694.6 ptxas info : Compile time = 1673.393 ms 2025-09-07T06:37:09.8715753Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:09.8725528Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8730861Z #22 694.6 32 bytes stack frame, 88 bytes spill stores, 76 bytes spill loads 2025-09-07T06:37:09.8732141Z #22 694.6 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:37:09.8733376Z #22 694.6 ptxas info : Compile time = 1436.768 ms 2025-09-07T06:37:09.8738693Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:09.8748650Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8755418Z #22 694.6 48 bytes stack frame, 152 bytes spill stores, 172 bytes spill loads 2025-09-07T06:37:09.8756788Z #22 694.6 ptxas info : Used 168 registers, used 9 barriers, 48 bytes cumulative stack size 2025-09-07T06:37:09.8757801Z #22 694.6 ptxas info : Compile time = 1568.133 ms 2025-09-07T06:37:09.8763355Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:09.8773466Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8778795Z #22 694.6 56 bytes stack frame, 308 bytes spill stores, 340 bytes spill loads 2025-09-07T06:37:09.8779961Z #22 694.6 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:37:09.8781023Z #22 694.6 ptxas info : Compile time = 2708.626 ms 2025-09-07T06:37:09.8786269Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:09.8796360Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.8801664Z #22 694.6 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:37:09.8803140Z #22 694.6 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:37:09.8804153Z #22 694.6 ptxas info : Compile time = 1116.869 ms 2025-09-07T06:37:09.8809572Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:09.9867703Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.9873159Z #22 694.6 32 bytes stack frame, 116 bytes spill stores, 128 bytes spill loads 2025-09-07T06:37:09.9874931Z #22 694.6 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:37:09.9876248Z #22 694.6 ptxas info : Compile time = 1210.538 ms 2025-09-07T06:37:09.9882162Z #22 694.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:09.9893939Z #22 694.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:09.9899763Z #22 694.6 40 bytes stack frame, 288 bytes spill stores, 320 bytes spill loads 2025-09-07T06:37:09.9901192Z #22 694.6 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:37:09.9902522Z #22 694.6 ptxas info : Compile time = 2303.619 ms 2025-09-07T06:37:11.0302522Z #22 695.8 [55/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:37:11.0320669Z #22 695.8 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:37:11.0325487Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:11.0334360Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0339110Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.0340186Z #22 695.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:37:11.0341332Z #22 695.8 ptxas info : Compile time = 1.698 ms 2025-09-07T06:37:11.0346132Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:11.0355111Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0360005Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.0361082Z #22 695.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:37:11.0361957Z #22 695.8 ptxas info : Compile time = 0.820 ms 2025-09-07T06:37:11.0367132Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:11.0376740Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0381731Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.0382771Z #22 695.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:11.0383645Z #22 695.8 ptxas info : Compile time = 0.817 ms 2025-09-07T06:37:11.0388710Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:11.0398175Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0403397Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.0404418Z #22 695.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:11.0405333Z #22 695.8 ptxas info : Compile time = 0.528 ms 2025-09-07T06:37:11.0410660Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:11.0419828Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0424551Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.0425563Z #22 695.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:37:11.0426503Z #22 695.8 ptxas info : Compile time = 0.515 ms 2025-09-07T06:37:11.0431816Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:11.0441580Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0446812Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.0447867Z #22 695.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:11.0448976Z #22 695.8 ptxas info : Compile time = 0.506 ms 2025-09-07T06:37:11.0454269Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:11.0463875Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0469082Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.0470135Z #22 695.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:11.0471025Z #22 695.8 ptxas info : Compile time = 0.592 ms 2025-09-07T06:37:11.0476110Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:11.0485234Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0490291Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.0491482Z #22 695.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:37:11.0492380Z #22 695.8 ptxas info : Compile time = 0.511 ms 2025-09-07T06:37:11.0497698Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:11.0507267Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0512410Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.0513430Z #22 695.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:11.0514335Z #22 695.8 ptxas info : Compile time = 0.531 ms 2025-09-07T06:37:11.0519445Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:11.0528839Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0534095Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.0535103Z #22 695.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:11.0536018Z #22 695.8 ptxas info : Compile time = 0.517 ms 2025-09-07T06:37:11.0536684Z #22 695.8 ptxas info : 11 bytes gmem 2025-09-07T06:37:11.0541447Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:11.0550316Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0555052Z #22 695.8 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:37:11.0556221Z #22 695.8 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:37:11.0557183Z #22 695.8 ptxas info : Compile time = 562.007 ms 2025-09-07T06:37:11.0562193Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:11.0571165Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0575956Z #22 695.8 24 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:37:11.0577013Z #22 695.8 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:37:11.0578005Z #22 695.8 ptxas info : Compile time = 562.140 ms 2025-09-07T06:37:11.0582979Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:11.0592361Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0597518Z #22 695.8 40 bytes stack frame, 76 bytes spill stores, 84 bytes spill loads 2025-09-07T06:37:11.0598669Z #22 695.8 ptxas info : Used 168 registers, used 9 barriers, 40 bytes cumulative stack size 2025-09-07T06:37:11.0599667Z #22 695.8 ptxas info : Compile time = 692.707 ms 2025-09-07T06:37:11.0604926Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:11.0614418Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0619330Z #22 695.8 56 bytes stack frame, 280 bytes spill stores, 304 bytes spill loads 2025-09-07T06:37:11.0620391Z #22 695.8 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:37:11.0621276Z #22 695.8 ptxas info : Compile time = 1440.090 ms 2025-09-07T06:37:11.0626113Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:11.0635052Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0640089Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.0641039Z #22 695.8 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:37:11.0641789Z #22 695.8 ptxas info : Compile time = 937.894 ms 2025-09-07T06:37:11.0646992Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:11.0757628Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0762955Z #22 695.8 32 bytes stack frame, 148 bytes spill stores, 168 bytes spill loads 2025-09-07T06:37:11.0764109Z #22 695.8 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:37:11.0765059Z #22 695.8 ptxas info : Compile time = 1109.294 ms 2025-09-07T06:37:11.0770625Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:11.0780275Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0785664Z #22 695.8 40 bytes stack frame, 216 bytes spill stores, 280 bytes spill loads 2025-09-07T06:37:11.0786858Z #22 695.8 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:37:11.0787848Z #22 695.8 ptxas info : Compile time = 2202.742 ms 2025-09-07T06:37:11.0793134Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:11.0802400Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.0807289Z #22 695.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:11.1793574Z #22 695.8 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:37:11.1794262Z #22 695.8 ptxas info : Compile time = 829.777 ms 2025-09-07T06:37:11.1798128Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:11.1805300Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.1809216Z #22 695.8 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:37:11.1810116Z #22 695.8 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:37:11.1811073Z #22 695.8 ptxas info : Compile time = 1049.750 ms 2025-09-07T06:37:11.1815278Z #22 695.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:11.1822354Z #22 695.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:11.1826261Z #22 695.8 48 bytes stack frame, 252 bytes spill stores, 268 bytes spill loads 2025-09-07T06:37:11.1827156Z #22 695.8 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:37:11.1827940Z #22 695.8 ptxas info : Compile time = 2137.159 ms 2025-09-07T06:37:17.7644109Z #22 702.5 [56/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:37:17.7663079Z #22 702.5 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:37:17.7668100Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:17.7677418Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7682395Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.7683433Z #22 702.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:17.7684341Z #22 702.5 ptxas info : Compile time = 1.833 ms 2025-09-07T06:37:17.7689701Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:17.7699807Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7705066Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.7707695Z #22 702.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:17.7708588Z #22 702.5 ptxas info : Compile time = 0.993 ms 2025-09-07T06:37:17.7713864Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:17.7723542Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7728876Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.7729898Z #22 702.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:17.7730784Z #22 702.5 ptxas info : Compile time = 0.840 ms 2025-09-07T06:37:17.7735796Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:17.7745181Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7750367Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.7751428Z #22 702.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:17.7752304Z #22 702.5 ptxas info : Compile time = 0.584 ms 2025-09-07T06:37:17.7757670Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:17.7767926Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7773450Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.7774613Z #22 702.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:17.7775626Z #22 702.5 ptxas info : Compile time = 0.578 ms 2025-09-07T06:37:17.7781031Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:17.7790912Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7796242Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.7797303Z #22 702.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:17.7798210Z #22 702.5 ptxas info : Compile time = 0.564 ms 2025-09-07T06:37:17.7803157Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:17.7812688Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7817704Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.7818758Z #22 702.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:17.7819659Z #22 702.5 ptxas info : Compile time = 0.556 ms 2025-09-07T06:37:17.7825046Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:17.7835079Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7840413Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.7841548Z #22 702.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:17.7842520Z #22 702.5 ptxas info : Compile time = 0.595 ms 2025-09-07T06:37:17.7848020Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:17.7858318Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7863724Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.7864764Z #22 702.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:17.7865684Z #22 702.5 ptxas info : Compile time = 0.549 ms 2025-09-07T06:37:17.7866361Z #22 702.5 ptxas info : 11 bytes gmem 2025-09-07T06:37:17.7871326Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:17.7880929Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7885947Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.7886916Z #22 702.5 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:37:17.7887719Z #22 702.5 ptxas info : Compile time = 655.330 ms 2025-09-07T06:37:17.7893358Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:17.7903476Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7908864Z #22 702.5 24 bytes stack frame, 52 bytes spill stores, 40 bytes spill loads 2025-09-07T06:37:17.7910258Z #22 702.5 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:37:17.7911395Z #22 702.5 ptxas info : Compile time = 963.394 ms 2025-09-07T06:37:17.7916683Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:17.7926497Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7932039Z #22 702.5 136 bytes stack frame, 236 bytes spill stores, 348 bytes spill loads 2025-09-07T06:37:17.7933215Z #22 702.5 ptxas info : Used 168 registers, used 16 barriers, 136 bytes cumulative stack size 2025-09-07T06:37:17.7934257Z #22 702.5 ptxas info : Compile time = 2197.207 ms 2025-09-07T06:37:17.7939269Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:17.7948518Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7953797Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.7954778Z #22 702.5 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:37:17.7955593Z #22 702.5 ptxas info : Compile time = 1371.205 ms 2025-09-07T06:37:17.7961517Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:17.7972114Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.7977676Z #22 702.5 24 bytes stack frame, 48 bytes spill stores, 36 bytes spill loads 2025-09-07T06:37:17.7978995Z #22 702.5 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:37:17.7980176Z #22 702.5 ptxas info : Compile time = 1865.584 ms 2025-09-07T06:37:17.7985779Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:17.7995452Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.8000841Z #22 702.5 104 bytes stack frame, 152 bytes spill stores, 208 bytes spill loads 2025-09-07T06:37:17.8001962Z #22 702.5 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:37:17.8002848Z #22 702.5 ptxas info : Compile time = 3440.334 ms 2025-09-07T06:37:17.8008099Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:17.8017581Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.8022087Z #22 702.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:17.8023074Z #22 702.5 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:37:17.8023916Z #22 702.5 ptxas info : Compile time = 939.583 ms 2025-09-07T06:37:17.8030161Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:17.8040370Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.8045963Z #22 702.5 24 bytes stack frame, 48 bytes spill stores, 36 bytes spill loads 2025-09-07T06:37:17.8047248Z #22 702.5 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:37:17.8048384Z #22 702.5 ptxas info : Compile time = 1365.681 ms 2025-09-07T06:37:17.8054398Z #22 702.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:17.8064137Z #22 702.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:17.8069829Z #22 702.5 104 bytes stack frame, 156 bytes spill stores, 204 bytes spill loads 2025-09-07T06:37:17.8070857Z #22 702.5 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:37:17.8071723Z #22 702.5 ptxas info : Compile time = 2766.817 ms 2025-09-07T06:37:19.7055432Z #22 704.5 [57/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:37:19.7069210Z #22 704.5 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:37:19.7073109Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:19.7079921Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7083557Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7084339Z #22 704.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:19.7085030Z #22 704.5 ptxas info : Compile time = 1.672 ms 2025-09-07T06:37:19.7088958Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:19.7096376Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7100312Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7101111Z #22 704.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:19.7101781Z #22 704.5 ptxas info : Compile time = 0.804 ms 2025-09-07T06:37:19.7105861Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:19.7113098Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7117051Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7117833Z #22 704.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:19.7118514Z #22 704.5 ptxas info : Compile time = 0.725 ms 2025-09-07T06:37:19.7123127Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:19.7130273Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7134315Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7135118Z #22 704.5 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:37:19.7135780Z #22 704.5 ptxas info : Compile time = 0.513 ms 2025-09-07T06:37:19.7139715Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:19.7147128Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7151520Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7152307Z #22 704.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:19.7152990Z #22 704.5 ptxas info : Compile time = 0.508 ms 2025-09-07T06:37:19.7157071Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:19.7164300Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7168251Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7169047Z #22 704.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:19.7169716Z #22 704.5 ptxas info : Compile time = 0.486 ms 2025-09-07T06:37:19.7173868Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:19.7181016Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7184838Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7185612Z #22 704.5 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:37:19.7186291Z #22 704.5 ptxas info : Compile time = 0.534 ms 2025-09-07T06:37:19.7190191Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:19.7197414Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7201382Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7202183Z #22 704.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:19.7202862Z #22 704.5 ptxas info : Compile time = 0.492 ms 2025-09-07T06:37:19.7206984Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:19.7214379Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7218309Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7219073Z #22 704.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:37:19.7219764Z #22 704.5 ptxas info : Compile time = 0.495 ms 2025-09-07T06:37:19.7220398Z #22 704.5 ptxas info : 11 bytes gmem 2025-09-07T06:37:19.7223978Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:19.7231253Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7234898Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7235594Z #22 704.5 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:37:19.7236192Z #22 704.5 ptxas info : Compile time = 724.020 ms 2025-09-07T06:37:19.7240127Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:19.7247394Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7251714Z #22 704.5 8 bytes stack frame, 36 bytes spill stores, 24 bytes spill loads 2025-09-07T06:37:19.7252597Z #22 704.5 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:37:19.7253533Z #22 704.5 ptxas info : Compile time = 911.717 ms 2025-09-07T06:37:19.7257483Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:19.7264690Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7268665Z #22 704.5 40 bytes stack frame, 92 bytes spill stores, 136 bytes spill loads 2025-09-07T06:37:19.7269542Z #22 704.5 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:37:19.7270324Z #22 704.5 ptxas info : Compile time = 2079.685 ms 2025-09-07T06:37:19.7274384Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:19.7281503Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7286457Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7287180Z #22 704.5 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:37:19.7287773Z #22 704.5 ptxas info : Compile time = 1612.403 ms 2025-09-07T06:37:19.7292004Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:19.7299390Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7303486Z #22 704.5 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:37:19.7304415Z #22 704.5 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:37:19.7305334Z #22 704.5 ptxas info : Compile time = 1651.262 ms 2025-09-07T06:37:19.7309377Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:19.7316728Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7320763Z #22 704.5 56 bytes stack frame, 120 bytes spill stores, 152 bytes spill loads 2025-09-07T06:37:19.7321650Z #22 704.5 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:37:19.7322578Z #22 704.5 ptxas info : Compile time = 3424.898 ms 2025-09-07T06:37:19.7326415Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:19.7333739Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7337589Z #22 704.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:19.7338308Z #22 704.5 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:37:19.7338904Z #22 704.5 ptxas info : Compile time = 1298.967 ms 2025-09-07T06:37:19.7342860Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:19.7350553Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7354507Z #22 704.5 8 bytes stack frame, 36 bytes spill stores, 24 bytes spill loads 2025-09-07T06:37:19.7355366Z #22 704.5 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:37:19.7356314Z #22 704.5 ptxas info : Compile time = 1297.421 ms 2025-09-07T06:37:19.7360257Z #22 704.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:19.7367682Z #22 704.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:19.7371800Z #22 704.5 40 bytes stack frame, 92 bytes spill stores, 132 bytes spill loads 2025-09-07T06:37:19.7372694Z #22 704.5 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:37:19.7373643Z #22 704.5 ptxas info : Compile time = 2638.531 ms 2025-09-07T06:37:20.4971088Z #22 705.3 [58/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:37:20.4988071Z #22 705.3 ptxas info : 11 bytes gmem 2025-09-07T06:37:20.4992703Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5001093Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5005576Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5006424Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5007165Z #22 705.3 ptxas info : Compile time = 2.222 ms 2025-09-07T06:37:20.5011761Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5019999Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5024739Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5025602Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5026301Z #22 705.3 ptxas info : Compile time = 1.109 ms 2025-09-07T06:37:20.5030902Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5039167Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5043733Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5044581Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5045285Z #22 705.3 ptxas info : Compile time = 0.732 ms 2025-09-07T06:37:20.5050128Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5058648Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5063102Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5063977Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5064679Z #22 705.3 ptxas info : Compile time = 0.664 ms 2025-09-07T06:37:20.5069134Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5077492Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5081878Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5082750Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5083588Z #22 705.3 ptxas info : Compile time = 0.927 ms 2025-09-07T06:37:20.5088064Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5096346Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5100793Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5101618Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5102349Z #22 705.3 ptxas info : Compile time = 0.619 ms 2025-09-07T06:37:20.5106982Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5115767Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5120387Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5121261Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5121941Z #22 705.3 ptxas info : Compile time = 0.639 ms 2025-09-07T06:37:20.5126301Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5134723Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5139265Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5140086Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5140783Z #22 705.3 ptxas info : Compile time = 0.608 ms 2025-09-07T06:37:20.5145137Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5153677Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5158124Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5158978Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5159678Z #22 705.3 ptxas info : Compile time = 0.605 ms 2025-09-07T06:37:20.5164082Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5172201Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5176587Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5177390Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5178297Z #22 705.3 ptxas info : Compile time = 0.609 ms 2025-09-07T06:37:20.5182703Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5190730Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5195234Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5196092Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5196773Z #22 705.3 ptxas info : Compile time = 0.609 ms 2025-09-07T06:37:20.5201365Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5209752Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5214484Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5215337Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5216053Z #22 705.3 ptxas info : Compile time = 0.611 ms 2025-09-07T06:37:20.5220385Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5228574Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5233019Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5233907Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5234603Z #22 705.3 ptxas info : Compile time = 0.645 ms 2025-09-07T06:37:20.5239220Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5247302Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5257411Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5258276Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5258968Z #22 705.3 ptxas info : Compile time = 0.604 ms 2025-09-07T06:37:20.5263542Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5271441Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5276118Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5276963Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5277664Z #22 705.3 ptxas info : Compile time = 0.636 ms 2025-09-07T06:37:20.5282332Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5290811Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5295531Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5296368Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5297051Z #22 705.3 ptxas info : Compile time = 0.607 ms 2025-09-07T06:37:20.5301619Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5309461Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5313767Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5314593Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5315278Z #22 705.3 ptxas info : Compile time = 0.614 ms 2025-09-07T06:37:20.5320045Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:20.5328171Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5332742Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5333579Z #22 705.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:20.5334243Z #22 705.3 ptxas info : Compile time = 0.587 ms 2025-09-07T06:37:20.5334925Z #22 705.3 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:37:20.5339363Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5347215Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5351760Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5352685Z #22 705.3 ptxas info : Used 238 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5353477Z #22 705.3 ptxas info : Compile time = 643.706 ms 2025-09-07T06:37:20.5357776Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5365823Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5370093Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5371167Z #22 705.3 ptxas info : Used 230 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5371977Z #22 705.3 ptxas info : Compile time = 722.370 ms 2025-09-07T06:37:20.5376245Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5384456Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5388978Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5389919Z #22 705.3 ptxas info : Used 234 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5390738Z #22 705.3 ptxas info : Compile time = 1389.370 ms 2025-09-07T06:37:20.5395207Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5403301Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5407604Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5408521Z #22 705.3 ptxas info : Used 237 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5409350Z #22 705.3 ptxas info : Compile time = 1567.570 ms 2025-09-07T06:37:20.5413904Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5421873Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5426286Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5427233Z #22 705.3 ptxas info : Used 237 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5428041Z #22 705.3 ptxas info : Compile time = 1636.684 ms 2025-09-07T06:37:20.5432442Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5440535Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5444819Z #22 705.3 112 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5446098Z #22 705.3 ptxas info : Used 250 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:37:20.5447181Z #22 705.3 ptxas info : Compile time = 3743.462 ms 2025-09-07T06:37:20.5453641Z #22 705.3 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi64EEENS3_ILi192EEEEEELi192EN7cutlass10bfloat16_tEfNS8_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EE3mmaINS_16FlashAttnFwdSm80ISC_NS_21CollectiveEpilogueFwdINS2_IJS4_S6_S5_EEENS2_IJNS3_ILi1EEESH_SH_EEES9_SB_Li256ELb1ELb1ELb0ELb0EEENS_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm96EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESS_EEESH_NS3_ILi24EEEEEENS2_IJNS2_IJSH_SS_EEENS3_ILi0EEENS3_ILi4EEEEEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSC_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:37:20.5459944Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5464718Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5473474Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5478117Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5479082Z #22 705.3 ptxas info : Used 251 registers, used 6 barriers, 1416 bytes cmem[0] 2025-09-07T06:37:20.5480219Z #22 705.3 ptxas info : Compile time = 1345.557 ms 2025-09-07T06:37:20.5484512Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5492565Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5496816Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5497742Z #22 705.3 ptxas info : Used 241 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5498521Z #22 705.3 ptxas info : Compile time = 1168.589 ms 2025-09-07T06:37:20.5503121Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5511415Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5515779Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5516764Z #22 705.3 ptxas info : Used 236 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5517582Z #22 705.3 ptxas info : Compile time = 2606.906 ms 2025-09-07T06:37:20.5521722Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5530412Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5535043Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5536019Z #22 705.3 ptxas info : Used 244 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5536869Z #22 705.3 ptxas info : Compile time = 695.673 ms 2025-09-07T06:37:20.5541598Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5550112Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5554592Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5555571Z #22 705.3 ptxas info : Used 246 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5556401Z #22 705.3 ptxas info : Compile time = 836.566 ms 2025-09-07T06:37:20.5561018Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5569090Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5574162Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5575125Z #22 705.3 ptxas info : Used 223 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5575970Z #22 705.3 ptxas info : Compile time = 1862.658 ms 2025-09-07T06:37:20.5580324Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5588325Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5592638Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5593616Z #22 705.3 ptxas info : Used 236 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5594443Z #22 705.3 ptxas info : Compile time = 1911.168 ms 2025-09-07T06:37:20.5598901Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5606771Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5611277Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5612204Z #22 705.3 ptxas info : Used 234 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5613035Z #22 705.3 ptxas info : Compile time = 2204.079 ms 2025-09-07T06:37:20.5617459Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5625330Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5629856Z #22 705.3 112 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5631045Z #22 705.3 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:37:20.5632086Z #22 705.3 ptxas info : Compile time = 4526.309 ms 2025-09-07T06:37:20.5638365Z #22 705.3 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi64EEENS3_ILi192EEEEEELi192EN7cutlass10bfloat16_tEfNS8_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EE3mmaINS_16FlashAttnFwdSm80ISC_NS_21CollectiveEpilogueFwdINS2_IJS4_S6_S5_EEENS2_IJNS3_ILi1EEESH_SH_EEES9_SB_Li256ELb1ELb1ELb0ELb0EEENS_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm96EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESS_EEESH_NS3_ILi24EEEEEENS2_IJNS2_IJSH_SS_EEENS3_ILi0EEENS3_ILi4EEEEEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSC_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:37:20.5644688Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5649772Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5658365Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5662957Z #22 705.3 56 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads 2025-09-07T06:37:20.5664131Z #22 705.3 ptxas info : Used 255 registers, used 6 barriers, 56 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:37:20.5665173Z #22 705.3 ptxas info : Compile time = 1733.309 ms 2025-09-07T06:37:20.5669477Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5677708Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5682170Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5683107Z #22 705.3 ptxas info : Used 232 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5683935Z #22 705.3 ptxas info : Compile time = 1452.228 ms 2025-09-07T06:37:20.5688550Z #22 705.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:20.5696993Z #22 705.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:20.5701338Z #22 705.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:20.5702305Z #22 705.3 ptxas info : Used 235 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:20.5703170Z #22 705.3 ptxas info : Compile time = 2857.319 ms 2025-09-07T06:37:46.0043200Z #22 730.8 [59/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:37:46.1593081Z #22 730.8 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:37:46.1598932Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:46.1607795Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1612715Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.1613617Z #22 730.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:37:46.1614358Z #22 730.8 ptxas info : Compile time = 1.638 ms 2025-09-07T06:37:46.1619247Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:46.1628867Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1634158Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.1635168Z #22 730.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:37:46.1636137Z #22 730.8 ptxas info : Compile time = 0.879 ms 2025-09-07T06:37:46.1641869Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:46.1653777Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1659597Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.1660702Z #22 730.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:46.1661663Z #22 730.8 ptxas info : Compile time = 0.753 ms 2025-09-07T06:37:46.1667382Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:46.1677856Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1683206Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.1684269Z #22 730.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:46.1685173Z #22 730.8 ptxas info : Compile time = 0.509 ms 2025-09-07T06:37:46.1690310Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:46.1700335Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1705693Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.1706857Z #22 730.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:37:46.1707797Z #22 730.8 ptxas info : Compile time = 0.478 ms 2025-09-07T06:37:46.1713323Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:46.1723973Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1729510Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.1730686Z #22 730.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:46.1731826Z #22 730.8 ptxas info : Compile time = 0.467 ms 2025-09-07T06:37:46.1737780Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:46.1748245Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1754446Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.1755527Z #22 730.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:46.1756435Z #22 730.8 ptxas info : Compile time = 0.495 ms 2025-09-07T06:37:46.1761991Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:46.1772592Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1778288Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.1779415Z #22 730.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:37:46.1780421Z #22 730.8 ptxas info : Compile time = 0.493 ms 2025-09-07T06:37:46.1786435Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:46.1796696Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1802343Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.1803435Z #22 730.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:46.1804380Z #22 730.8 ptxas info : Compile time = 0.461 ms 2025-09-07T06:37:46.1808804Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:46.1818418Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1824091Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.1825176Z #22 730.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:37:46.1826108Z #22 730.8 ptxas info : Compile time = 0.455 ms 2025-09-07T06:37:46.1826793Z #22 730.8 ptxas info : 11 bytes gmem 2025-09-07T06:37:46.1830923Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:46.1840561Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1845927Z #22 730.8 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:37:46.1847133Z #22 730.8 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:37:46.1848103Z #22 730.8 ptxas info : Compile time = 740.057 ms 2025-09-07T06:37:46.1856630Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:46.1866958Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1872420Z #22 730.8 24 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:37:46.1873679Z #22 730.8 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:37:46.1874764Z #22 730.8 ptxas info : Compile time = 739.924 ms 2025-09-07T06:37:46.1880682Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:46.1890807Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1896971Z #22 730.8 40 bytes stack frame, 84 bytes spill stores, 100 bytes spill loads 2025-09-07T06:37:46.1898152Z #22 730.8 ptxas info : Used 168 registers, used 9 barriers, 40 bytes cumulative stack size 2025-09-07T06:37:46.1899220Z #22 730.8 ptxas info : Compile time = 911.474 ms 2025-09-07T06:37:46.1904884Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:46.1913866Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1919198Z #22 730.8 56 bytes stack frame, 280 bytes spill stores, 304 bytes spill loads 2025-09-07T06:37:46.1920388Z #22 730.8 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:37:46.1921416Z #22 730.8 ptxas info : Compile time = 1803.653 ms 2025-09-07T06:37:46.1925731Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:46.1934321Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1938726Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.1939596Z #22 730.8 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:37:46.1940285Z #22 730.8 ptxas info : Compile time = 1454.362 ms 2025-09-07T06:37:46.1945856Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:46.1954507Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1959269Z #22 730.8 32 bytes stack frame, 124 bytes spill stores, 136 bytes spill loads 2025-09-07T06:37:46.1960332Z #22 730.8 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:37:46.1961145Z #22 730.8 ptxas info : Compile time = 2101.819 ms 2025-09-07T06:37:46.1966120Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:46.1974832Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1979622Z #22 730.8 48 bytes stack frame, 228 bytes spill stores, 284 bytes spill loads 2025-09-07T06:37:46.1980606Z #22 730.8 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:37:46.1981588Z #22 730.8 ptxas info : Compile time = 2816.565 ms 2025-09-07T06:37:46.1986340Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:46.1995245Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.1999548Z #22 730.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:46.2000414Z #22 730.8 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:37:46.2001160Z #22 730.8 ptxas info : Compile time = 1103.755 ms 2025-09-07T06:37:46.2005987Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:46.2015429Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.2020368Z #22 730.8 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:37:46.2021251Z #22 730.8 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:37:46.2022072Z #22 730.8 ptxas info : Compile time = 1320.131 ms 2025-09-07T06:37:46.2026766Z #22 730.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:46.2035275Z #22 730.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:46.2040570Z #22 730.8 48 bytes stack frame, 312 bytes spill stores, 348 bytes spill loads 2025-09-07T06:37:46.2041777Z #22 730.8 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:37:46.2042812Z #22 730.8 ptxas info : Compile time = 2338.218 ms 2025-09-07T06:37:53.9767430Z #22 738.8 [60/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:37:54.1331618Z #22 738.8 ptxas info : 11 bytes gmem 2025-09-07T06:37:54.1335299Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1341916Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1345548Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1346244Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1346847Z #22 738.8 ptxas info : Compile time = 2.196 ms 2025-09-07T06:37:54.1363585Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1372736Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1377634Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1378506Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1379193Z #22 738.8 ptxas info : Compile time = 1.121 ms 2025-09-07T06:37:54.1383723Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1392863Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1398035Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1399262Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1399990Z #22 738.8 ptxas info : Compile time = 0.983 ms 2025-09-07T06:37:54.1404814Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1413371Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1418346Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1419170Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1419937Z #22 738.8 ptxas info : Compile time = 0.652 ms 2025-09-07T06:37:54.1425077Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1434843Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1439978Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1440969Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1441731Z #22 738.8 ptxas info : Compile time = 0.601 ms 2025-09-07T06:37:54.1447134Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1459430Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1464731Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1465630Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1466649Z #22 738.8 ptxas info : Compile time = 0.568 ms 2025-09-07T06:37:54.1472053Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1481068Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1486131Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1487027Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1487802Z #22 738.8 ptxas info : Compile time = 0.598 ms 2025-09-07T06:37:54.1493170Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1503307Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1508715Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1509739Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1510535Z #22 738.8 ptxas info : Compile time = 0.582 ms 2025-09-07T06:37:54.1515371Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1524550Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1529752Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1530738Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1531725Z #22 738.8 ptxas info : Compile time = 0.558 ms 2025-09-07T06:37:54.1535599Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1542262Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1545865Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1546576Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1547167Z #22 738.8 ptxas info : Compile time = 0.556 ms 2025-09-07T06:37:54.1551327Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1558419Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1562476Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1563202Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1563806Z #22 738.8 ptxas info : Compile time = 0.572 ms 2025-09-07T06:37:54.1567677Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1574933Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1578974Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1579676Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1580265Z #22 738.8 ptxas info : Compile time = 0.556 ms 2025-09-07T06:37:54.1583862Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1590677Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1594227Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1594943Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1595524Z #22 738.8 ptxas info : Compile time = 0.585 ms 2025-09-07T06:37:54.1599409Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1606543Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1610487Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1611547Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1612148Z #22 738.8 ptxas info : Compile time = 0.559 ms 2025-09-07T06:37:54.1616027Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1623125Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1627010Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1627730Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1628458Z #22 738.8 ptxas info : Compile time = 0.547 ms 2025-09-07T06:37:54.1632212Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1639241Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1642976Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1643662Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1644261Z #22 738.8 ptxas info : Compile time = 0.582 ms 2025-09-07T06:37:54.1648156Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1655724Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1659585Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1660286Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1661053Z #22 738.8 ptxas info : Compile time = 0.554 ms 2025-09-07T06:37:54.1664931Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:37:54.1672048Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1675920Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1676611Z #22 738.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:37:54.1677214Z #22 738.8 ptxas info : Compile time = 0.561 ms 2025-09-07T06:37:54.1677968Z #22 738.8 ptxas info : 11 bytes gmem, 88 bytes cmem[4] 2025-09-07T06:37:54.1681604Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1688305Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1692121Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1692899Z #22 738.8 ptxas info : Used 226 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:54.1693588Z #22 738.8 ptxas info : Compile time = 778.382 ms 2025-09-07T06:37:54.1697492Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1704610Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1708542Z #22 738.8 24 bytes stack frame, 20 bytes spill stores, 24 bytes spill loads 2025-09-07T06:37:54.1709536Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 24 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:37:54.1710558Z #22 738.8 ptxas info : Compile time = 1548.639 ms 2025-09-07T06:37:54.1714431Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1721510Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1725433Z #22 738.8 104 bytes stack frame, 120 bytes spill stores, 124 bytes spill loads 2025-09-07T06:37:54.1726424Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 104 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:37:54.1727447Z #22 738.8 ptxas info : Compile time = 3105.793 ms 2025-09-07T06:37:54.1731202Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1737873Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1741459Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1742254Z #22 738.8 ptxas info : Used 235 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:54.1742947Z #22 738.8 ptxas info : Compile time = 1947.790 ms 2025-09-07T06:37:54.1746837Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1756489Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1762309Z #22 738.8 16 bytes stack frame, 12 bytes spill stores, 16 bytes spill loads 2025-09-07T06:37:54.1763937Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 16 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:37:54.1765254Z #22 738.8 ptxas info : Compile time = 4060.566 ms 2025-09-07T06:37:54.1771301Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1781599Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1787598Z #22 738.8 376 bytes stack frame, 300 bytes spill stores, 632 bytes spill loads 2025-09-07T06:37:54.1789020Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 376 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:37:54.1790579Z #22 738.8 ptxas info : Compile time = 6974.607 ms 2025-09-07T06:37:54.1798875Z #22 738.8 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi64EEENS3_ILi192EEEEEELi192EN7cutlass10bfloat16_tEfNS8_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EE3mmaINS_16FlashAttnFwdSm80ISC_NS_21CollectiveEpilogueFwdINS2_IJS4_S6_S5_EEENS2_IJNS3_ILi1EEESH_SH_EEES9_SB_Li256ELb1ELb1ELb1ELb0EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm96EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESS_EEESH_NS3_ILi24EEEEEENS2_IJNS2_IJSH_SS_EEENS3_ILi0EEENS3_ILi4EEEEEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSC_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:37:54.1807456Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1813549Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1823447Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1829168Z #22 738.8 16 bytes stack frame, 12 bytes spill stores, 16 bytes spill loads 2025-09-07T06:37:54.1830603Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 16 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:37:54.1831881Z #22 738.8 ptxas info : Compile time = 1762.657 ms 2025-09-07T06:37:54.1837859Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1848666Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1854857Z #22 738.8 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:37:54.1856306Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 16 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:37:54.1857573Z #22 738.8 ptxas info : Compile time = 2595.345 ms 2025-09-07T06:37:54.1863421Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1873825Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1880018Z #22 738.8 88 bytes stack frame, 100 bytes spill stores, 104 bytes spill loads 2025-09-07T06:37:54.1881480Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 88 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:37:54.1882773Z #22 738.8 ptxas info : Compile time = 4863.980 ms 2025-09-07T06:37:54.1888230Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1898019Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1903387Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1904548Z #22 738.8 ptxas info : Used 245 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:54.1905530Z #22 738.8 ptxas info : Compile time = 806.523 ms 2025-09-07T06:37:54.1911600Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1921993Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1927799Z #22 738.8 24 bytes stack frame, 20 bytes spill stores, 24 bytes spill loads 2025-09-07T06:37:54.1929227Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 24 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:37:54.1930486Z #22 738.8 ptxas info : Compile time = 1682.215 ms 2025-09-07T06:37:54.1936679Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1947346Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1952112Z #22 738.8 96 bytes stack frame, 108 bytes spill stores, 112 bytes spill loads 2025-09-07T06:37:54.1953112Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 96 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:37:54.1953985Z #22 738.8 ptxas info : Compile time = 3386.113 ms 2025-09-07T06:37:54.1957599Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1964177Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1967819Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.1968639Z #22 738.8 ptxas info : Used 228 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:37:54.1969347Z #22 738.8 ptxas info : Compile time = 2044.827 ms 2025-09-07T06:37:54.1973581Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1980805Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.1984833Z #22 738.8 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:37:54.1985827Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 16 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:37:54.1986687Z #22 738.8 ptxas info : Compile time = 3968.021 ms 2025-09-07T06:37:54.1990763Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.1997864Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.2001993Z #22 738.8 312 bytes stack frame, 232 bytes spill stores, 444 bytes spill loads 2025-09-07T06:37:54.2002997Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 312 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:37:54.2003885Z #22 738.8 ptxas info : Compile time = 6130.277 ms 2025-09-07T06:37:54.2009234Z #22 738.8 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi64EEENS3_ILi192EEEEEELi192EN7cutlass10bfloat16_tEfNS8_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EE3mmaINS_16FlashAttnFwdSm80ISC_NS_21CollectiveEpilogueFwdINS2_IJS4_S6_S5_EEENS2_IJNS3_ILi1EEESH_SH_EEES9_SB_Li256ELb1ELb1ELb1ELb0EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm96EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESS_EEESH_NS3_ILi24EEEEEENS2_IJNS2_IJSH_SS_EEENS3_ILi0EEENS3_ILi4EEEEEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSC_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:37:54.2014917Z #22 738.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:37:54.2018779Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:37:54.2025843Z #22 738.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:37:54.2029612Z #22 738.8 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:37:54.2030577Z #22 738.8 ptxas info : Used 255 registers, used 6 barriers, 16 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:37:54.2031446Z #22 738.8 ptxas info : Compile time = 1877.931 ms 2025-09-07T06:37:54.2033344Z #22 738.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat 2025-09-07T06:37:54.2035292Z #22 738.8 [output clipped, log limit 2MiB reached] 2025-09-07T07:00:51.2014089Z #22 2116.0 /opt/python/cp312-cp312/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:90: SetuptoolsDeprecationWarning: setup.py install is deprecated. 2025-09-07T07:00:51.2015068Z #22 2116.0 !! 2025-09-07T07:00:51.2015301Z #22 2116.0 2025-09-07T07:00:51.2015623Z #22 2116.0 ******************************************************************************** 2025-09-07T07:00:51.2016349Z #22 2116.0 Please avoid running ``setup.py`` directly. 2025-09-07T07:00:51.2016871Z #22 2116.0 Instead, use pypa/build, pypa/installer or other 2025-09-07T07:00:51.2017334Z #22 2116.0 standards-based tools. 2025-09-07T07:00:51.2017667Z #22 2116.0 2025-09-07T07:00:51.2018179Z #22 2116.0 See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details. 2025-09-07T07:00:51.2019000Z #22 2116.0 ******************************************************************************** 2025-09-07T07:00:51.2019401Z #22 2116.0 2025-09-07T07:00:51.2019637Z #22 2116.0 !! 2025-09-07T07:00:51.2019897Z #22 2116.0 self.initialize_options() 2025-09-07T07:00:52.1233693Z #22 2116.9 warning: no files found matching 'third_party/flash-attention/version.txt' 2025-09-07T07:01:47.2420242Z #22 DONE 2172.0s 2025-09-07T07:01:47.3933945Z 2025-09-07T07:01:47.3934866Z #23 [base 17/20] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system xformers-dist/*.whl --verbose 2025-09-07T07:01:47.6976797Z #23 0.455 DEBUG uv 0.8.4 2025-09-07T07:01:47.9008652Z #23 0.459 DEBUG Searching for default Python interpreter in managed installations or search path 2025-09-07T07:01:47.9009576Z #23 0.459 DEBUG Searching for managed installations at `/root/.local/share/uv/python` 2025-09-07T07:01:47.9010606Z #23 0.461 DEBUG Found `cpython-3.12.11-linux-x86_64-gnu` at `/opt/python/cp312-cp312/bin/python` (first executable in the search path) 2025-09-07T07:01:47.9011809Z #23 0.461 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T07:01:47.9012473Z #23 0.461 DEBUG Acquired lock for `/opt/python/cp312-cp312` 2025-09-07T07:01:47.9013482Z #23 0.466 DEBUG At least one requirement is not satisfied: file:///workspace/xformers-dist/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T07:01:47.9014485Z #23 0.466 DEBUG Using request timeout of 500s 2025-09-07T07:01:47.9014990Z #23 0.473 DEBUG Solving with installed Python version: 3.12.11 2025-09-07T07:01:47.9015571Z #23 0.473 DEBUG Solving with target Python version: >=3.12.11 2025-09-07T07:01:47.9016111Z #23 0.474 DEBUG Adding direct dependency: xformers* 2025-09-07T07:01:47.9017128Z #23 0.474 DEBUG Searching for a compatible version of xformers @ file:///workspace/xformers-dist/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl (*) 2025-09-07T07:01:47.9018381Z #23 0.474 DEBUG Adding transitive dependency for xformers==0.0.33+5d4b92a5.d20250907: torch>=2.8 2025-09-07T07:01:47.9019194Z #23 0.474 DEBUG Adding transitive dependency for xformers==0.0.33+5d4b92a5.d20250907: numpy* 2025-09-07T07:01:47.9020139Z #23 0.476 DEBUG Found stale response for: https://pypi.org/simple/torch/ 2025-09-07T07:01:47.9020818Z #23 0.476 DEBUG Sending revalidation request for: https://pypi.org/simple/torch/ 2025-09-07T07:01:47.9021474Z #23 0.478 DEBUG Found stale response for: https://pypi.org/simple/numpy/ 2025-09-07T07:01:47.9022136Z #23 0.478 DEBUG Sending revalidation request for: https://pypi.org/simple/numpy/ 2025-09-07T07:01:47.9022840Z #23 0.487 DEBUG Found not-modified response for: https://pypi.org/simple/numpy/ 2025-09-07T07:01:47.9023624Z #23 0.490 DEBUG Found not-modified response for: https://pypi.org/simple/torch/ 2025-09-07T07:01:47.9024236Z #23 0.491 DEBUG Searching for a compatible version of torch (>=2.8) 2025-09-07T07:01:47.9025272Z #23 0.491 DEBUG Found installed version of torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.8 2025-09-07T07:01:47.9026364Z #23 0.491 DEBUG Selecting: torch==2.9.0.dev20250901+cu129 [installed] (installed) 2025-09-07T07:01:47.9027455Z #23 0.492 DEBUG Found installed version of torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.8 2025-09-07T07:01:47.9028496Z #23 0.492 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T07:01:47.9029158Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: filelock* 2025-09-07T07:01:47.9030032Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: typing-extensions>=4.10.0 2025-09-07T07:01:47.9031183Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: setuptools{python_full_version >= '3.12'}* 2025-09-07T07:01:47.9032119Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: sympy>=1.13.3 2025-09-07T07:01:47.9032925Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: networkx>=2.5.1 2025-09-07T07:01:47.9033662Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: jinja2* 2025-09-07T07:01:47.9034376Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: fsspec>=0.8.5 2025-09-07T07:01:47.9035490Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.86, <12.9.86+ 2025-09-07T07:01:47.9036965Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T07:01:47.9038419Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T07:01:47.9039862Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=9.10.2.21, <9.10.2.21+ 2025-09-07T07:01:47.9041284Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.1.4, <12.9.1.4+ 2025-09-07T07:01:47.9042688Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.4.1.4, <11.4.1.4+ 2025-09-07T07:01:47.9044115Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=10.3.10.19, <10.3.10.19+ 2025-09-07T07:01:47.9045566Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.7.5.82, <11.7.5.82+ 2025-09-07T07:01:47.9047081Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.5.10.65, <12.5.10.65+ 2025-09-07T07:01:47.9048549Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=0.7.1, <0.7.1+ 2025-09-07T07:01:47.9050511Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=2.27.5, <2.27.5+ 2025-09-07T07:01:47.9052053Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=3.3.20, <3.3.20+ 2025-09-07T07:01:47.9053540Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T07:01:47.9055043Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.86, <12.9.86+ 2025-09-07T07:01:47.9056575Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=1.14.1.1, <1.14.1.1+ 2025-09-07T07:01:47.9057931Z #23 0.493 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: pytorch-triton{sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T07:01:47.9058927Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/filelock/ 2025-09-07T07:01:47.9059632Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/filelock/ 2025-09-07T07:01:47.9060377Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T07:01:47.9061196Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/typing-extensions/ 2025-09-07T07:01:47.9061969Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/sympy/ 2025-09-07T07:01:47.9062617Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/sympy/ 2025-09-07T07:01:47.9063380Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/networkx/ 2025-09-07T07:01:47.9064019Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/networkx/ 2025-09-07T07:01:47.9064666Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/jinja2/ 2025-09-07T07:01:47.9065302Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/jinja2/ 2025-09-07T07:01:47.9065921Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/fsspec/ 2025-09-07T07:01:47.9066552Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/fsspec/ 2025-09-07T07:01:47.9067248Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T07:01:47.9068033Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T07:01:47.9068821Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T07:01:47.9069604Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T07:01:47.9070388Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T07:01:47.9071154Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T07:01:47.9071908Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T07:01:47.9072627Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T07:01:47.9073359Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T07:01:47.9074101Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T07:01:47.9074823Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T07:01:47.9075606Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T07:01:47.9076329Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T07:01:47.9077064Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T07:01:47.9077818Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T07:01:47.9078569Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T07:01:47.9079328Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T07:01:47.9080079Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T07:01:47.9080852Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T07:01:47.9081638Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T07:01:47.9082378Z #23 0.495 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T07:01:47.9083105Z #23 0.495 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T07:01:47.9083830Z #23 0.496 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T07:01:47.9084615Z #23 0.496 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T07:01:47.9085341Z #23 0.496 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T07:01:47.9086066Z #23 0.496 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T07:01:47.9086875Z #23 0.496 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T07:01:47.9087630Z #23 0.496 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T07:01:47.9088566Z #23 0.496 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T07:01:47.9089318Z #23 0.496 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T07:01:47.9090070Z #23 0.496 DEBUG Found stale response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T07:01:47.9090805Z #23 0.496 DEBUG Sending revalidation request for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T07:01:47.9091767Z #23 0.496 DEBUG Found stale response for: https://pypi.org/simple/setuptools/ 2025-09-07T07:01:47.9092487Z #23 0.496 DEBUG Sending revalidation request for: https://pypi.org/simple/setuptools/ 2025-09-07T07:01:47.9093206Z #23 0.497 DEBUG Found not-modified response for: https://pypi.org/simple/sympy/ 2025-09-07T07:01:47.9093924Z #23 0.497 DEBUG Found not-modified response for: https://pypi.org/simple/networkx/ 2025-09-07T07:01:47.9094637Z #23 0.497 DEBUG Found not-modified response for: https://pypi.org/simple/jinja2/ 2025-09-07T07:01:47.9095331Z #23 0.497 DEBUG Found not-modified response for: https://pypi.org/simple/filelock/ 2025-09-07T07:01:47.9096240Z #23 0.497 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T07:01:47.9097324Z #23 0.498 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T07:01:47.9098392Z #23 0.498 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T07:01:47.9099439Z #23 0.498 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T07:01:47.9100399Z #23 0.498 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T07:01:47.9101184Z #23 0.498 DEBUG Found not-modified response for: https://pypi.org/simple/fsspec/ 2025-09-07T07:01:47.9101981Z #23 0.498 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T07:01:47.9102808Z #23 0.498 DEBUG Found not-modified response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T07:01:47.9103700Z #23 0.498 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T07:01:47.9104502Z #23 0.498 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T07:01:47.9105302Z #23 0.498 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T07:01:47.9106083Z #23 0.499 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T07:01:47.9106884Z #23 0.499 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T07:01:47.9107664Z #23 0.499 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T07:01:47.9108704Z #23 0.499 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.86, <12.9.86+) 2025-09-07T07:01:47.9110244Z #23 0.499 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.9.86, <12.9.86+ 2025-09-07T07:01:47.9111604Z #23 0.499 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T07:01:47.9112415Z #23 0.499 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.9.86: nvidia-cuda-nvrtc-cu12==12.9.86 2025-09-07T07:01:47.9113595Z #23 0.499 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.9.86: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.86 2025-09-07T07:01:47.9114706Z #23 0.499 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12 (==12.9.86) 2025-09-07T07:01:47.9115899Z #23 0.499 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T07:01:47.9117045Z #23 0.499 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T07:01:47.9117765Z #23 0.499 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T07:01:47.9118527Z #23 0.499 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T07:01:47.9119428Z #23 0.499 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T07:01:47.9120599Z #23 0.500 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T07:01:47.9122065Z #23 0.500 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T07:01:47.9123475Z #23 0.500 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.86) 2025-09-07T07:01:47.9124901Z #23 0.500 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T07:01:47.9126037Z #23 0.500 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T07:01:47.9127005Z #23 0.500 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T07:01:47.9128537Z #23 0.500 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T07:01:47.9129785Z #23 0.500 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T07:01:47.9130635Z #23 0.500 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.9.79: nvidia-cuda-runtime-cu12==12.9.79 2025-09-07T07:01:47.9132155Z #23 0.500 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.9.79: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T07:01:47.9133307Z #23 0.500 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12 (==12.9.79) 2025-09-07T07:01:47.9134136Z #23 0.500 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T07:01:47.9135470Z #23 0.500 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T07:01:47.9136743Z #23 0.500 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T07:01:47.9137510Z #23 0.500 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T07:01:47.9138305Z #23 0.500 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T07:01:47.9139133Z #23 0.500 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T07:01:47.9139954Z #23 0.500 DEBUG Found not-modified response for: https://pypi.org/simple/setuptools/ 2025-09-07T07:01:47.9140740Z #23 0.501 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T07:01:47.9141542Z #23 0.501 DEBUG Found not-modified response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T07:01:47.9142914Z #23 0.501 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T07:01:47.9144580Z #23 0.501 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T07:01:47.9146038Z #23 0.501 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T07:01:47.9147220Z #23 0.501 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T07:01:47.9148193Z #23 0.501 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T07:01:47.9149964Z #23 0.501 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T07:01:47.9151149Z #23 0.501 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T07:01:47.9152011Z #23 0.501 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.9.79: nvidia-cuda-cupti-cu12==12.9.79 2025-09-07T07:01:47.9153255Z #23 0.501 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.9.79: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T07:01:47.9154378Z #23 0.501 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12 (==12.9.79) 2025-09-07T07:01:47.9155534Z #23 0.501 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T07:01:47.9156667Z #23 0.501 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T07:01:47.9157811Z #23 0.501 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T07:01:47.9159286Z #23 0.501 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T07:01:47.9160703Z #23 0.501 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T07:01:47.9161906Z #23 0.501 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T07:01:47.9162837Z #23 0.501 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=9.10.2.21, <9.10.2.21+) 2025-09-07T07:01:47.9164203Z #23 0.501 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=9.10.2.21, <9.10.2.21+ 2025-09-07T07:01:47.9165277Z #23 0.501 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T07:01:47.9166031Z #23 0.501 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12==9.10.2.21 2025-09-07T07:01:47.9167136Z #23 0.501 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==9.10.2.21 2025-09-07T07:01:47.9168122Z #23 0.501 DEBUG Searching for a compatible version of nvidia-cudnn-cu12 (==9.10.2.21) 2025-09-07T07:01:47.9169213Z #23 0.501 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T07:01:47.9170234Z #23 0.501 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T07:01:47.9171522Z #23 0.501 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T07:01:47.9172763Z #23 0.501 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T07:01:47.9173793Z #23 0.501 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==9.10.2.21) 2025-09-07T07:01:47.9175119Z #23 0.501 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies * 2025-09-07T07:01:47.9176575Z #23 0.501 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T07:01:47.9177643Z #23 0.501 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T07:01:47.9178435Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T07:01:47.9179527Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.1.4, <12.9.1.4+) 2025-09-07T07:01:47.9180965Z #23 0.502 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=12.9.1.4, <12.9.1.4+ 2025-09-07T07:01:47.9182096Z #23 0.502 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T07:01:47.9182893Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.9.1.4: nvidia-cublas-cu12==12.9.1.4 2025-09-07T07:01:47.9184159Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.9.1.4: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.1.4 2025-09-07T07:01:47.9185171Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cublas-cu12 (==12.9.1.4) 2025-09-07T07:01:47.9186220Z #23 0.502 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T07:01:47.9187669Z #23 0.502 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T07:01:47.9188696Z #23 0.502 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T07:01:47.9189586Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.1.4) 2025-09-07T07:01:47.9190876Z #23 0.502 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T07:01:47.9191888Z #23 0.502 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T07:01:47.9192821Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.4.1.4, <11.4.1.4+) 2025-09-07T07:01:47.9194257Z #23 0.502 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=11.4.1.4, <11.4.1.4+ 2025-09-07T07:01:47.9195380Z #23 0.502 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T07:01:47.9196160Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-cufft-cu12==11.4.1.4 2025-09-07T07:01:47.9197258Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.4.1.4 2025-09-07T07:01:47.9198233Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cufft-cu12 (==11.4.1.4) 2025-09-07T07:01:47.9199395Z #23 0.502 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T07:01:47.9200999Z #23 0.502 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T07:01:47.9202090Z #23 0.502 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T07:01:47.9202827Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-nvjitlink-cu12* 2025-09-07T07:01:47.9203807Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.4.1.4) 2025-09-07T07:01:47.9205178Z #23 0.502 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies * 2025-09-07T07:01:47.9206768Z #23 0.502 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T07:01:47.9207851Z #23 0.502 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T07:01:47.9208577Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-nvjitlink-cu12* 2025-09-07T07:01:47.9209609Z #23 0.502 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=10.3.10.19, <10.3.10.19+) 2025-09-07T07:01:47.9211086Z #23 0.502 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=10.3.10.19, <10.3.10.19+ 2025-09-07T07:01:47.9212398Z #23 0.502 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T07:01:47.9213221Z #23 0.502 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.10.19: nvidia-curand-cu12==10.3.10.19 2025-09-07T07:01:47.9214474Z #23 0.502 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.10.19: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==10.3.10.19 2025-09-07T07:01:47.9215557Z #23 0.502 DEBUG Searching for a compatible version of nvidia-curand-cu12 (==10.3.10.19) 2025-09-07T07:01:47.9216688Z #23 0.502 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T07:01:47.9217798Z #23 0.502 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T07:01:47.9218901Z #23 0.502 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T07:01:47.9220294Z #23 0.502 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==10.3.10.19) 2025-09-07T07:01:47.9221682Z #23 0.502 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T07:01:47.9222779Z #23 0.502 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T07:01:47.9224021Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.7.5.82, <11.7.5.82+) 2025-09-07T07:01:47.9225440Z #23 0.502 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=11.7.5.82, <11.7.5.82+ 2025-09-07T07:01:47.9226538Z #23 0.502 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T07:01:47.9227398Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusolver-cu12==11.7.5.82 2025-09-07T07:01:47.9228561Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.7.5.82 2025-09-07T07:01:47.9229603Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cusolver-cu12 (==11.7.5.82) 2025-09-07T07:01:47.9230699Z #23 0.502 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T07:01:47.9232171Z #23 0.502 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T07:01:47.9233241Z #23 0.502 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T07:01:47.9234000Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cublas-cu12* 2025-09-07T07:01:47.9234848Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-nvjitlink-cu12* 2025-09-07T07:01:47.9235723Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusparse-cu12* 2025-09-07T07:01:47.9236736Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.7.5.82) 2025-09-07T07:01:47.9238134Z #23 0.502 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies * 2025-09-07T07:01:47.9239673Z #23 0.502 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T07:01:47.9240732Z #23 0.502 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T07:01:47.9241487Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cublas-cu12* 2025-09-07T07:01:47.9242360Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-nvjitlink-cu12* 2025-09-07T07:01:47.9243228Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusparse-cu12* 2025-09-07T07:01:47.9244295Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.5.10.65, <12.5.10.65+) 2025-09-07T07:01:47.9245802Z #23 0.502 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.5.10.65, <12.5.10.65+ 2025-09-07T07:01:47.9247049Z #23 0.502 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T07:01:47.9247874Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-cusparse-cu12==12.5.10.65 2025-09-07T07:01:47.9249377Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.5.10.65 2025-09-07T07:01:47.9250680Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cusparse-cu12 (==12.5.10.65) 2025-09-07T07:01:47.9252101Z #23 0.502 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T07:01:47.9253890Z #23 0.502 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T07:01:47.9255197Z #23 0.502 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T07:01:47.9256064Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-nvjitlink-cu12* 2025-09-07T07:01:47.9257172Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.5.10.65) 2025-09-07T07:01:47.9258697Z #23 0.502 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T07:01:47.9259931Z #23 0.502 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T07:01:47.9260760Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-nvjitlink-cu12* 2025-09-07T07:01:47.9261874Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=0.7.1, <0.7.1+) 2025-09-07T07:01:47.9263442Z #23 0.502 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies >=0.7.1, <0.7.1+ 2025-09-07T07:01:47.9264540Z #23 0.502 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T07:01:47.9265315Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12==0.7.1 2025-09-07T07:01:47.9266486Z #23 0.502 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==0.7.1 2025-09-07T07:01:47.9267532Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12 (==0.7.1) 2025-09-07T07:01:47.9268776Z #23 0.502 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T07:01:47.9270272Z #23 0.502 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T07:01:47.9271379Z #23 0.502 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T07:01:47.9272333Z #23 0.502 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==0.7.1) 2025-09-07T07:01:47.9273703Z #23 0.502 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T07:01:47.9274776Z #23 0.502 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T07:01:47.9275718Z #23 0.502 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=2.27.5, <2.27.5+) 2025-09-07T07:01:47.9277148Z #23 0.502 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=2.27.5, <2.27.5+ 2025-09-07T07:01:47.9278271Z #23 0.502 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T07:01:47.9279003Z #23 0.502 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12==2.27.5 2025-09-07T07:01:47.9280068Z #23 0.502 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==2.27.5 2025-09-07T07:01:47.9281183Z #23 0.502 DEBUG Searching for a compatible version of nvidia-nccl-cu12 (==2.27.5) 2025-09-07T07:01:47.9282274Z #23 0.502 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T07:01:47.9300333Z #23 0.502 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T07:01:47.9301657Z #23 0.502 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T07:01:47.9302579Z #23 0.502 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==2.27.5) 2025-09-07T07:01:47.9304088Z #23 0.502 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T07:01:47.9305157Z #23 0.502 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T07:01:47.9306051Z #23 0.502 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=3.3.20, <3.3.20+) 2025-09-07T07:01:47.9307488Z #23 0.502 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=3.3.20, <3.3.20+ 2025-09-07T07:01:47.9308636Z #23 0.502 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T07:01:47.9309377Z #23 0.502 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12==3.3.20 2025-09-07T07:01:47.9310491Z #23 0.502 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.3.20 2025-09-07T07:01:47.9311483Z #23 0.502 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12 (==3.3.20) 2025-09-07T07:01:47.9312626Z #23 0.502 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T07:01:47.9314217Z #23 0.502 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T07:01:47.9315312Z #23 0.503 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T07:01:47.9316242Z #23 0.503 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.3.20) 2025-09-07T07:01:47.9317608Z #23 0.503 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T07:01:47.9318710Z #23 0.503 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T07:01:47.9319615Z #23 0.503 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T07:01:47.9320981Z #23 0.503 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T07:01:47.9322069Z #23 0.503 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T07:01:47.9322784Z #23 0.503 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.9.79: nvidia-nvtx-cu12==12.9.79 2025-09-07T07:01:47.9323827Z #23 0.503 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.9.79: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T07:01:47.9324791Z #23 0.503 DEBUG Searching for a compatible version of nvidia-nvtx-cu12 (==12.9.79) 2025-09-07T07:01:47.9325895Z #23 0.503 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T07:01:47.9327393Z #23 0.503 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T07:01:47.9328488Z #23 0.503 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T07:01:47.9329340Z #23 0.503 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T07:01:47.9330646Z #23 0.503 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T07:01:47.9332016Z #23 0.503 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T07:01:47.9332993Z #23 0.503 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.86, <12.9.86+) 2025-09-07T07:01:47.9334566Z #23 0.503 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.9.86, <12.9.86+ 2025-09-07T07:01:47.9335820Z #23 0.503 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T07:01:47.9336657Z #23 0.503 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.9.86: nvidia-nvjitlink-cu12==12.9.86 2025-09-07T07:01:47.9337897Z #23 0.503 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.9.86: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.86 2025-09-07T07:01:47.9338994Z #23 0.503 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12 (==12.9.86) 2025-09-07T07:01:47.9340249Z #23 0.503 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T07:01:47.9341995Z #23 0.503 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T07:01:47.9343207Z #23 0.503 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T07:01:47.9344294Z #23 0.503 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.86) 2025-09-07T07:01:47.9345700Z #23 0.503 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T07:01:47.9346852Z #23 0.503 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T07:01:47.9347791Z #23 0.503 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=1.14.1.1, <1.14.1.1+) 2025-09-07T07:01:47.9349579Z #23 0.503 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=1.14.1.1, <1.14.1.1+ 2025-09-07T07:01:47.9350987Z #23 0.503 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T07:01:47.9351797Z #23 0.503 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.14.1.1: nvidia-cufile-cu12==1.14.1.1 2025-09-07T07:01:47.9352967Z #23 0.503 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.14.1.1: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==1.14.1.1 2025-09-07T07:01:47.9354115Z #23 0.503 DEBUG Searching for a compatible version of nvidia-cufile-cu12 (==1.14.1.1) 2025-09-07T07:01:47.9355324Z #23 0.503 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T07:01:47.9357024Z #23 0.503 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T07:01:47.9358295Z #23 0.503 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T07:01:47.9359234Z #23 0.503 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==1.14.1.1) 2025-09-07T07:01:47.9360702Z #23 0.503 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T07:01:47.9362088Z #23 0.503 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T07:01:47.9362861Z #23 0.503 DEBUG Searching for a compatible version of pytorch-triton{sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T07:01:47.9364265Z #23 0.503 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T07:01:47.9365527Z #23 0.503 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T07:01:47.9366341Z #23 0.503 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton==3.4.0+gitf7888497 2025-09-07T07:01:47.9367410Z #23 0.503 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton{sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T07:01:47.9368344Z #23 0.503 DEBUG Searching for a compatible version of pytorch-triton (==3.4.0+gitf7888497) 2025-09-07T07:01:47.9369645Z #23 0.503 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T07:01:47.9371739Z #23 0.503 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T07:01:47.9373068Z #23 0.503 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T07:01:47.9373927Z #23 0.503 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T07:01:47.9374855Z #23 0.503 DEBUG Searching for a compatible version of pytorch-triton{sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T07:01:47.9375948Z #23 0.503 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies >=40.8.0 2025-09-07T07:01:47.9377538Z #23 0.503 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T07:01:47.9378871Z #23 0.503 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T07:01:47.9379676Z #23 0.503 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T07:01:47.9380399Z #23 0.503 DEBUG Searching for a compatible version of numpy (*) 2025-09-07T07:01:47.9380955Z #23 0.503 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T07:01:47.9381520Z #23 0.503 DEBUG Selecting: numpy==2.2.6 [installed] (installed) 2025-09-07T07:01:47.9382051Z #23 0.503 DEBUG Searching for a compatible version of filelock (*) 2025-09-07T07:01:47.9382904Z #23 0.503 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T07:01:47.9383914Z #23 0.503 DEBUG Selecting: filelock==3.19.1 [installed] (installed) 2025-09-07T07:01:47.9384468Z #23 0.503 DEBUG Searching for a compatible version of typing-extensions (>=4.10.0) 2025-09-07T07:01:47.9385374Z #23 0.503 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T07:01:47.9386297Z #23 0.503 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T07:01:47.9386965Z #23 0.503 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (*) 2025-09-07T07:01:47.9387836Z #23 0.503 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies * 2025-09-07T07:01:47.9388597Z #23 0.503 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T07:01:47.9389186Z #23 0.503 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools==78.1.0 2025-09-07T07:01:47.9389960Z #23 0.503 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools{python_full_version >= '3.12'}==78.1.0 2025-09-07T07:01:47.9390690Z #23 0.503 DEBUG Searching for a compatible version of setuptools (==78.1.0) 2025-09-07T07:01:47.9391490Z #23 0.503 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T07:01:47.9392284Z #23 0.503 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T07:01:47.9393072Z #23 0.503 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T07:01:47.9393987Z #23 0.503 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (==78.1.0) 2025-09-07T07:01:47.9394914Z #23 0.503 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T07:01:47.9395690Z #23 0.503 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T07:01:47.9396211Z #23 0.503 DEBUG Searching for a compatible version of sympy (>=1.13.3) 2025-09-07T07:01:47.9396947Z #23 0.503 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T07:01:47.9397651Z #23 0.503 DEBUG Selecting: sympy==1.14.0 [installed] (installed) 2025-09-07T07:01:47.9398201Z #23 0.503 DEBUG Adding transitive dependency for sympy==1.14.0: mpmath>=1.1.0, <1.4 2025-09-07T07:01:47.9398823Z #23 0.503 DEBUG Searching for a compatible version of networkx (>=2.5.1) 2025-09-07T07:01:47.9399588Z #23 0.503 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T07:01:47.9400306Z #23 0.503 DEBUG Selecting: networkx==3.5 [installed] (installed) 2025-09-07T07:01:47.9400773Z #23 0.503 DEBUG Searching for a compatible version of jinja2 (*) 2025-09-07T07:01:47.9401465Z #23 0.503 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T07:01:47.9402145Z #23 0.503 DEBUG Selecting: jinja2==3.1.6 [installed] (installed) 2025-09-07T07:01:47.9402679Z #23 0.503 DEBUG Adding transitive dependency for jinja2==3.1.6: markupsafe>=2.0 2025-09-07T07:01:47.9403237Z #23 0.503 DEBUG Searching for a compatible version of fsspec (>=0.8.5) 2025-09-07T07:01:47.9404012Z #23 0.503 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T07:01:47.9404942Z #23 0.503 DEBUG Selecting: fsspec==2025.7.0 [installed] (installed) 2025-09-07T07:01:47.9405496Z #23 0.504 DEBUG Found stale response for: https://pypi.org/simple/mpmath/ 2025-09-07T07:01:47.9406129Z #23 0.504 DEBUG Sending revalidation request for: https://pypi.org/simple/mpmath/ 2025-09-07T07:01:47.9406799Z #23 0.504 DEBUG Found stale response for: https://pypi.org/simple/markupsafe/ 2025-09-07T07:01:47.9407475Z #23 0.504 DEBUG Sending revalidation request for: https://pypi.org/simple/markupsafe/ 2025-09-07T07:01:47.9408166Z #23 0.505 DEBUG Found not-modified response for: https://pypi.org/simple/mpmath/ 2025-09-07T07:01:47.9408780Z #23 0.505 DEBUG Searching for a compatible version of mpmath (>=1.1.0, <1.4) 2025-09-07T07:01:47.9409666Z #23 0.505 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T07:01:47.9410707Z #23 0.505 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T07:01:47.9411734Z #23 0.505 DEBUG Selecting: mpmath==1.3.0 [installed] (installed) 2025-09-07T07:01:47.9412367Z #23 0.505 DEBUG Found not-modified response for: https://pypi.org/simple/markupsafe/ 2025-09-07T07:01:47.9413039Z #23 0.506 DEBUG Searching for a compatible version of markupsafe (>=2.0) 2025-09-07T07:01:47.9414122Z #23 0.506 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T07:01:47.9415639Z #23 0.506 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T07:01:47.9416698Z #23 0.506 DEBUG Selecting: markupsafe==3.0.2 [installed] (installed) 2025-09-07T07:01:47.9419410Z #23 0.507 DEBUG Tried 28 versions: filelock 1, fsspec 1, jinja2 1, markupsafe 1, mpmath 1, networkx 1, numpy 1, nvidia-cublas-cu12 1, nvidia-cuda-cupti-cu12 1, nvidia-cuda-nvrtc-cu12 1, nvidia-cuda-runtime-cu12 1, nvidia-cudnn-cu12 1, nvidia-cufft-cu12 1, nvidia-cufile-cu12 1, nvidia-curand-cu12 1, nvidia-cusolver-cu12 1, nvidia-cusparse-cu12 1, nvidia-cusparselt-cu12 1, nvidia-nccl-cu12 1, nvidia-nvjitlink-cu12 1, nvidia-nvshmem-cu12 1, nvidia-nvtx-cu12 1, pytorch-triton 1, setuptools 1, sympy 1, torch 1, typing-extensions 1, xformers 1 2025-09-07T07:01:47.9422054Z #23 0.507 DEBUG marker environment resolution took 0.033s 2025-09-07T07:01:47.9422487Z #23 0.507 Resolved 28 packages in 38ms 2025-09-07T07:01:47.9423583Z #23 0.507 DEBUG Requirement already installed: nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T07:01:47.9425058Z #23 0.507 DEBUG Requirement already installed: nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T07:01:47.9426330Z #23 0.507 DEBUG Requirement already installed: nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T07:01:47.9427340Z #23 0.507 DEBUG Requirement already installed: sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T07:01:47.9428412Z #23 0.507 DEBUG Requirement already installed: nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T07:01:47.9429750Z #23 0.507 DEBUG Requirement already installed: nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T07:01:47.9431043Z #23 0.507 DEBUG Requirement already installed: nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T07:01:47.9432452Z #23 0.507 DEBUG Requirement already installed: pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T07:01:47.9433847Z #23 0.507 DEBUG Requirement already installed: nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) 2025-09-07T07:01:47.9435100Z #23 0.507 DEBUG Requirement already installed: nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T07:01:47.9436410Z #23 0.507 DEBUG Requirement already installed: nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T07:01:47.9437733Z #23 0.507 DEBUG Requirement already installed: nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T07:01:47.9438756Z #23 0.507 DEBUG Requirement already installed: jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T07:01:47.9439817Z #23 0.507 DEBUG Requirement already installed: nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T07:01:47.9440913Z #23 0.507 DEBUG Requirement already installed: setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T07:01:47.9441987Z #23 0.507 DEBUG Requirement already installed: markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T07:01:47.9443238Z #23 0.507 DEBUG Identified uncached distribution: xformers @ file:///workspace/xformers-dist/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T07:01:47.9444299Z #23 0.507 DEBUG Requirement already installed: filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T07:01:47.9444964Z #23 0.507 DEBUG Requirement already installed: numpy==2.2.6 2025-09-07T07:01:47.9445918Z #23 0.507 DEBUG Requirement already installed: nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T07:01:47.9447270Z #23 0.507 DEBUG Requirement already installed: torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T07:01:47.9448583Z #23 0.507 DEBUG Requirement already installed: nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T07:01:47.9450091Z #23 0.507 DEBUG Requirement already installed: networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T07:01:47.9451217Z #23 0.507 DEBUG Requirement already installed: typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T07:01:47.9452299Z #23 0.507 DEBUG Requirement already installed: fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T07:01:47.9453581Z #23 0.507 DEBUG Requirement already installed: nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) 2025-09-07T07:01:47.9454763Z #23 0.507 DEBUG Requirement already installed: mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T07:01:47.9456032Z #23 0.507 DEBUG Requirement already installed: nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T07:01:47.9457040Z #23 0.507 DEBUG Unnecessary package: pyyaml==6.0.2 2025-09-07T07:01:47.9457525Z #23 0.507 DEBUG Unnecessary package: aiohappyeyeballs==2.6.1 2025-09-07T07:01:47.9458001Z #23 0.507 DEBUG Unnecessary package: aiohttp==3.12.15 2025-09-07T07:01:47.9458462Z #23 0.507 DEBUG Unnecessary package: aiosignal==1.4.0 2025-09-07T07:01:47.9458939Z #23 0.507 DEBUG Unnecessary package: annotated-types==0.7.0 2025-09-07T07:01:47.9459399Z #23 0.507 DEBUG Unnecessary package: anyio==4.10.0 2025-09-07T07:01:47.9459828Z #23 0.507 DEBUG Unnecessary package: astor==0.8.1 2025-09-07T07:01:47.9460240Z #23 0.507 DEBUG Unnecessary package: attrs==25.3.0 2025-09-07T07:01:47.9460663Z #23 0.507 DEBUG Unnecessary package: blake3==1.0.5 2025-09-07T07:01:47.9461073Z #23 0.507 DEBUG Unnecessary package: build==1.3.0 2025-09-07T07:01:47.9461561Z #23 0.507 DEBUG Unnecessary package: cachetools==6.2.0 2025-09-07T07:01:47.9462002Z #23 0.507 DEBUG Unnecessary package: cbor2==5.7.0 2025-09-07T07:01:47.9462424Z #23 0.507 DEBUG Unnecessary package: certifi==2025.8.3 2025-09-07T07:01:47.9462848Z #23 0.507 DEBUG Unnecessary package: cffi==1.17.1 2025-09-07T07:01:47.9463391Z #23 0.507 DEBUG Unnecessary package: charset-normalizer==3.4.3 2025-09-07T07:01:47.9463890Z #23 0.507 DEBUG Unnecessary package: click==8.2.1 2025-09-07T07:01:47.9464273Z #23 0.507 DEBUG Unnecessary package: cloudpickle==3.1.1 2025-09-07T07:01:47.9464724Z #23 0.507 DEBUG Unnecessary package: compressed-tensors==0.11.0 2025-09-07T07:01:47.9465148Z #23 0.507 DEBUG Unnecessary package: depyf==0.19.0 2025-09-07T07:01:47.9465528Z #23 0.507 DEBUG Unnecessary package: dill==0.4.0 2025-09-07T07:01:47.9465911Z #23 0.507 DEBUG Unnecessary package: diskcache==5.6.3 2025-09-07T07:01:47.9466287Z #23 0.507 DEBUG Unnecessary package: distro==1.9.0 2025-09-07T07:01:47.9466677Z #23 0.507 DEBUG Unnecessary package: dnspython==2.7.0 2025-09-07T07:01:47.9467056Z #23 0.507 DEBUG Unnecessary package: einops==0.8.1 2025-09-07T07:01:47.9467467Z #23 0.507 DEBUG Unnecessary package: email-validator==2.3.0 2025-09-07T07:01:47.9467872Z #23 0.507 DEBUG Unnecessary package: fastapi==0.116.1 2025-09-07T07:01:47.9468276Z #23 0.507 DEBUG Unnecessary package: fastapi-cli==0.0.10 2025-09-07T07:01:47.9468714Z #23 0.507 DEBUG Unnecessary package: fastapi-cloud-cli==0.1.5 2025-09-07T07:01:47.9469132Z #23 0.507 DEBUG Unnecessary package: frozendict==2.4.6 2025-09-07T07:01:47.9469533Z #23 0.507 DEBUG Unnecessary package: frozenlist==1.7.0 2025-09-07T07:01:47.9469912Z #23 0.507 DEBUG Unnecessary package: gguf==0.17.1 2025-09-07T07:01:47.9470290Z #23 0.507 DEBUG Unnecessary package: h11==0.16.0 2025-09-07T07:01:47.9470655Z #23 0.507 DEBUG Unnecessary package: hf-xet==1.1.9 2025-09-07T07:01:47.9471052Z #23 0.507 DEBUG Unnecessary package: httpcore==1.0.9 2025-09-07T07:01:47.9471445Z #23 0.507 DEBUG Unnecessary package: httptools==0.6.4 2025-09-07T07:01:47.9471843Z #23 0.507 DEBUG Unnecessary package: httpx==0.28.1 2025-09-07T07:01:47.9472268Z #23 0.507 DEBUG Unnecessary package: huggingface-hub==0.34.4 2025-09-07T07:01:47.9472675Z #23 0.507 DEBUG Unnecessary package: idna==3.10 2025-09-07T07:01:47.9473070Z #23 0.507 DEBUG Unnecessary package: interegular==0.3.3 2025-09-07T07:01:47.9473462Z #23 0.507 DEBUG Unnecessary package: jiter==0.10.0 2025-09-07T07:01:47.9473868Z #23 0.507 DEBUG Unnecessary package: jsonschema==4.25.1 2025-09-07T07:01:47.9474346Z #23 0.507 DEBUG Unnecessary package: jsonschema-specifications==2025.4.1 2025-09-07T07:01:47.9474857Z #23 0.507 DEBUG Unnecessary package: lark==1.2.2 2025-09-07T07:01:47.9475258Z #23 0.507 DEBUG Unnecessary package: llguidance==0.7.30 2025-09-07T07:01:47.9475653Z #23 0.507 DEBUG Unnecessary package: llvmlite==0.44.0 2025-09-07T07:01:47.9476097Z #23 0.507 DEBUG Unnecessary package: lm-format-enforcer==0.11.3 2025-09-07T07:01:47.9476547Z #23 0.507 DEBUG Unnecessary package: markdown-it-py==4.0.0 2025-09-07T07:01:47.9476969Z #23 0.507 DEBUG Unnecessary package: mdurl==0.1.2 2025-09-07T07:01:47.9477379Z #23 0.507 DEBUG Unnecessary package: mistral-common==1.8.4 2025-09-07T07:01:47.9477790Z #23 0.507 DEBUG Unnecessary package: msgspec==0.19.0 2025-09-07T07:01:47.9478197Z #23 0.507 DEBUG Unnecessary package: multidict==6.6.4 2025-09-07T07:01:47.9478598Z #23 0.507 DEBUG Unnecessary package: ninja==1.13.0 2025-09-07T07:01:47.9478975Z #23 0.507 DEBUG Unnecessary package: numba==0.61.2 2025-09-07T07:01:47.9479371Z #23 0.507 DEBUG Unnecessary package: openai==1.106.1 2025-09-07T07:01:47.9479781Z #23 0.507 DEBUG Unnecessary package: openai-harmony==0.0.4 2025-09-07T07:01:47.9480279Z #23 0.507 DEBUG Unnecessary package: opencv-python-headless==4.12.0.88 2025-09-07T07:01:47.9480749Z #23 0.508 DEBUG Unnecessary package: opt-einsum==3.4.0 2025-09-07T07:01:47.9481177Z #23 0.508 DEBUG Unnecessary package: outlines-core==0.2.10 2025-09-07T07:01:47.9481589Z #23 0.508 DEBUG Unnecessary package: packaging==25.0 2025-09-07T07:01:47.9482105Z #23 0.508 DEBUG Unnecessary package: partial-json-parser==0.2.1.1.post6 2025-09-07T07:01:47.9482928Z #23 0.508 DEBUG Unnecessary package: pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T07:01:47.9483654Z #23 0.508 DEBUG Preserving seed package: pip==25.2 2025-09-07T07:01:47.9484111Z #23 0.508 DEBUG Unnecessary package: prometheus-client==0.22.1 2025-09-07T07:01:47.9484681Z #23 0.508 DEBUG Unnecessary package: prometheus-fastapi-instrumentator==7.1.0 2025-09-07T07:01:47.9485193Z #23 0.508 DEBUG Unnecessary package: propcache==0.3.2 2025-09-07T07:01:47.9485592Z #23 0.508 DEBUG Unnecessary package: protobuf==6.32.0 2025-09-07T07:01:47.9485966Z #23 0.508 DEBUG Unnecessary package: psutil==7.0.0 2025-09-07T07:01:47.9486349Z #23 0.508 DEBUG Unnecessary package: py-cpuinfo==9.0.0 2025-09-07T07:01:47.9486734Z #23 0.508 DEBUG Unnecessary package: pybase64==1.4.2 2025-09-07T07:01:47.9487124Z #23 0.508 DEBUG Unnecessary package: pycountry==24.6.1 2025-09-07T07:01:47.9487515Z #23 0.508 DEBUG Unnecessary package: pycparser==2.22 2025-09-07T07:01:47.9487904Z #23 0.508 DEBUG Unnecessary package: pydantic==2.11.7 2025-09-07T07:01:47.9488302Z #23 0.508 DEBUG Unnecessary package: pydantic-core==2.33.2 2025-09-07T07:01:47.9488753Z #23 0.508 DEBUG Unnecessary package: pydantic-extra-types==2.10.5 2025-09-07T07:01:47.9489197Z #23 0.508 DEBUG Unnecessary package: pygments==2.19.2 2025-09-07T07:01:47.9489603Z #23 0.508 DEBUG Unnecessary package: pyproject-hooks==1.2.0 2025-09-07T07:01:47.9490035Z #23 0.508 DEBUG Unnecessary package: python-dotenv==1.1.1 2025-09-07T07:01:47.9490467Z #23 0.508 DEBUG Unnecessary package: python-json-logger==3.3.0 2025-09-07T07:01:47.9491014Z #23 0.508 DEBUG Unnecessary package: python-multipart==0.0.20 2025-09-07T07:01:47.9491615Z #23 0.508 DEBUG Unnecessary package: pyzmq==27.0.2 2025-09-07T07:01:47.9492056Z #23 0.508 DEBUG Unnecessary package: referencing==0.36.2 2025-09-07T07:01:47.9492509Z #23 0.508 DEBUG Unnecessary package: regex==2025.9.1 2025-09-07T07:01:47.9492939Z #23 0.508 DEBUG Unnecessary package: requests==2.32.5 2025-09-07T07:01:47.9493369Z #23 0.508 DEBUG Unnecessary package: rich==14.1.0 2025-09-07T07:01:47.9493801Z #23 0.508 DEBUG Unnecessary package: rich-toolkit==0.15.1 2025-09-07T07:01:47.9494248Z #23 0.508 DEBUG Unnecessary package: rignore==0.6.4 2025-09-07T07:01:47.9494682Z #23 0.508 DEBUG Unnecessary package: rpds-py==0.27.1 2025-09-07T07:01:47.9495131Z #23 0.508 DEBUG Unnecessary package: safetensors==0.6.2 2025-09-07T07:01:47.9495565Z #23 0.508 DEBUG Unnecessary package: scipy==1.16.1 2025-09-07T07:01:47.9496052Z #23 0.508 DEBUG Unnecessary package: sentencepiece==0.2.1 2025-09-07T07:01:47.9496523Z #23 0.508 DEBUG Unnecessary package: sentry-sdk==2.37.0 2025-09-07T07:01:47.9496977Z #23 0.508 DEBUG Unnecessary package: setproctitle==1.3.7 2025-09-07T07:01:47.9497437Z #23 0.508 DEBUG Unnecessary package: shellingham==1.5.4 2025-09-07T07:01:47.9497861Z #23 0.508 DEBUG Unnecessary package: six==1.17.0 2025-09-07T07:01:47.9498280Z #23 0.508 DEBUG Unnecessary package: sniffio==1.3.1 2025-09-07T07:01:47.9498718Z #23 0.508 DEBUG Unnecessary package: soundfile==0.13.1 2025-09-07T07:01:47.9499169Z #23 0.508 DEBUG Unnecessary package: soxr==0.5.0.post1 2025-09-07T07:01:47.9499610Z #23 0.508 DEBUG Unnecessary package: starlette==0.47.3 2025-09-07T07:01:47.9500045Z #23 0.508 DEBUG Unnecessary package: tiktoken==0.11.0 2025-09-07T07:01:47.9500492Z #23 0.508 DEBUG Unnecessary package: tokenizers==0.22.0 2025-09-07T07:01:47.9501443Z #23 0.508 DEBUG Unnecessary package: torchaudio==2.8.0.dev20250901+cu129 (from file:///dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T07:01:47.9502914Z #23 0.508 DEBUG Unnecessary package: torchvision==0.24.0.dev20250901+cu129 (from file:///dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T07:01:47.9503935Z #23 0.508 DEBUG Unnecessary package: tqdm==4.67.1 2025-09-07T07:01:47.9504407Z #23 0.508 DEBUG Unnecessary package: transformers==4.56.1 2025-09-07T07:01:47.9504834Z #23 0.508 DEBUG Unnecessary package: triton==3.4.0 2025-09-07T07:01:47.9505228Z #23 0.508 DEBUG Unnecessary package: typer==0.17.4 2025-09-07T07:01:47.9505672Z #23 0.508 DEBUG Unnecessary package: typing-inspection==0.4.1 2025-09-07T07:01:47.9506112Z #23 0.508 DEBUG Unnecessary package: urllib3==2.5.0 2025-09-07T07:01:47.9506580Z #23 0.508 DEBUG Preserving seed package: uv==0.8.4 2025-09-07T07:01:47.9506975Z #23 0.508 DEBUG Unnecessary package: uvicorn==0.35.0 2025-09-07T07:01:47.9507384Z #23 0.508 DEBUG Unnecessary package: uvloop==0.21.0 2025-09-07T07:01:47.9507808Z #23 0.508 DEBUG Unnecessary package: watchfiles==1.1.0 2025-09-07T07:01:47.9508232Z #23 0.508 DEBUG Unnecessary package: websockets==15.0.1 2025-09-07T07:01:47.9508645Z #23 0.508 DEBUG Unnecessary package: wheel==0.45.1 2025-09-07T07:01:47.9509041Z #23 0.508 DEBUG Unnecessary package: xgrammar==0.1.23 2025-09-07T07:01:47.9509449Z #23 0.508 DEBUG Unnecessary package: yarl==1.20.1 2025-09-07T07:01:49.8670417Z #23 2.624 Prepared 1 package in 2.11s 2025-09-07T07:01:50.3277035Z #23 3.085 Installed 1 package in 460ms 2025-09-07T07:01:50.3277986Z #23 3.085 + xformers==0.0.33+5d4b92a5.d20250907 (from file:///workspace/xformers-dist/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl) 2025-09-07T07:01:50.4781727Z #23 3.085 DEBUG Released lock at `/tmp/uv-281d6a3886c08524.lock` 2025-09-07T07:02:05.8452927Z #23 DONE 18.6s 2025-09-07T07:02:05.9990213Z 2025-09-07T07:02:05.9991018Z #24 [base 18/20] RUN uv pip freeze | grep -i '^torch\|^torchvision\|^torchaudio' > torch_build_versions.txt 2025-09-07T07:02:06.3412540Z #24 0.493 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T07:02:06.5201774Z #24 DONE 0.5s 2025-09-07T07:02:06.5202492Z 2025-09-07T07:02:06.5202931Z #25 [base 19/20] RUN cat torch_build_versions.txt 2025-09-07T07:02:07.0209543Z #25 0.652 torch @ file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T07:02:07.0210532Z #25 0.652 torchaudio @ file:///dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T07:02:07.0211901Z #25 0.652 torchvision @ file:///dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T07:02:07.1904687Z #25 DONE 0.7s 2025-09-07T07:02:07.1905188Z 2025-09-07T07:02:07.1905968Z #26 [base 20/20] RUN pip freeze | grep -E 'torch|xformers|torchvision|torchaudio' 2025-09-07T07:02:08.1584730Z #26 1.119 pytorch-triton @ file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T07:02:08.1586046Z #26 1.119 torch @ file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T07:02:08.1586941Z #26 1.119 torchaudio @ file:///dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T07:02:08.1587871Z #26 1.119 torchvision @ file:///dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T07:02:08.1588829Z #26 1.119 xformers @ file:///workspace/xformers-dist/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T07:02:08.3709642Z #26 DONE 1.1s 2025-09-07T07:02:08.3709899Z 2025-09-07T07:02:08.3710321Z #27 [vllm-base 4/18] COPY --from=base /workspace/torch_build_versions.txt ./torch_build_versions.txt 2025-09-07T07:02:08.3710935Z #27 DONE 0.0s 2025-09-07T07:02:08.3711101Z 2025-09-07T07:02:08.3711205Z #28 [build 1/7] COPY . . 2025-09-07T07:02:09.5737021Z #28 ... 2025-09-07T07:02:09.5737353Z 2025-09-07T07:02:09.5737941Z #29 [export-wheels 1/4] COPY --from=base /workspace/xformers-dist /wheels/xformers 2025-09-07T07:02:09.5738919Z #29 DONE 0.4s 2025-09-07T07:02:09.7242541Z 2025-09-07T07:02:24.0704248Z #30 [vllm-base 5/18] COPY --from=base /workspace/xformers-dist /wheels/xformers 2025-09-07T07:02:24.0704848Z #30 DONE 14.9s 2025-09-07T07:02:24.0705019Z 2025-09-07T07:02:24.0705139Z #28 [build 1/7] COPY . . 2025-09-07T07:02:46.8010693Z #28 DONE 38.6s 2025-09-07T07:02:46.9552287Z 2025-09-07T07:02:46.9554849Z #31 [build 2/7] RUN python3 use_existing_torch.py 2025-09-07T07:02:47.2618079Z #31 0.457 >>> cleaning requirements/common.txt 2025-09-07T07:02:47.2618576Z #31 0.457 <<< done cleaning requirements/common.txt 2025-09-07T07:02:47.2618951Z #31 0.457 2025-09-07T07:02:47.2619234Z #31 0.457 >>> cleaning requirements/build.txt 2025-09-07T07:02:47.2619809Z #31 0.457 removed: 2025-09-07T07:02:47.2620178Z #31 0.457 torch==2.8.0 2025-09-07T07:02:47.2620497Z #31 0.457 <<< done cleaning requirements/build.txt 2025-09-07T07:02:47.2620891Z #31 0.457 2025-09-07T07:02:47.2621198Z #31 0.457 >>> cleaning requirements/cpu-build.txt 2025-09-07T07:02:47.2621565Z #31 0.457 removed: 2025-09-07T07:02:47.2622090Z #31 0.457 # Temporarily used for x86 CPU backend to avoid performance regression of torch>2.6.0+cpu, 2025-09-07T07:02:47.2622785Z #31 0.457 # see https://github.com/pytorch/pytorch/pull/151218 2025-09-07T07:02:47.2623483Z #31 0.457 --extra-index-url https://download.pytorch.org/whl/cpu 2025-09-07T07:02:47.2623926Z #31 0.457 torch==2.6.0+cpu 2025-09-07T07:02:47.2624275Z #31 0.457 <<< done cleaning requirements/cpu-build.txt 2025-09-07T07:02:47.2624637Z #31 0.457 2025-09-07T07:02:47.2624906Z #31 0.457 >>> cleaning requirements/cpu.txt 2025-09-07T07:02:47.2625266Z #31 0.457 removed: 2025-09-07T07:02:47.2625619Z #31 0.457 --extra-index-url https://download.pytorch.org/whl/cpu 2025-09-07T07:02:47.2626649Z #31 0.457 torch==2.6.0+cpu; platform_machine == "x86_64" # torch>2.6.0+cpu has performance regression on x86 platform, see https://github.com/pytorch/pytorch/pull/151218 2025-09-07T07:02:47.2627539Z #31 0.457 torch==2.8.0; platform_system == "Darwin" 2025-09-07T07:02:47.2628068Z #31 0.457 torch==2.8.0; platform_machine == "ppc64le" or platform_machine == "aarch64" 2025-09-07T07:02:47.2628814Z #31 0.457 # required for the image processor of minicpm-o-2_6, this must be updated alongside torch 2025-09-07T07:02:47.2629569Z #31 0.457 torchaudio; platform_machine != "ppc64le" and platform_machine != "s390x" 2025-09-07T07:02:47.2630150Z #31 0.457 torchaudio==2.8.0; platform_machine == "ppc64le" 2025-09-07T07:02:47.2630754Z #31 0.457 # required for the image processor of phi3v, this must be updated alongside torch 2025-09-07T07:02:47.2631460Z #31 0.457 torchvision; platform_machine != "ppc64le" and platform_machine != "s390x" 2025-09-07T07:02:47.2632064Z #31 0.457 torchvision==0.23.0; platform_machine == "ppc64le" 2025-09-07T07:02:47.2632556Z #31 0.457 # Intel Extension for PyTorch, only for x86_64 CPUs 2025-09-07T07:02:47.2633622Z #31 0.457 intel_extension_for_pytorch==2.6.0; platform_machine == "x86_64" # torch>2.6.0+cpu has performance regression on x86 platform, see https://github.com/pytorch/pytorch/pull/151218 2025-09-07T07:02:47.2634872Z #31 0.457 triton==3.2.0; platform_machine == "x86_64" # Triton is required for torch 2.6+cpu, as it is imported in torch.compile. 2025-09-07T07:02:47.2635564Z #31 0.457 <<< done cleaning requirements/cpu.txt 2025-09-07T07:02:47.2635931Z #31 0.457 2025-09-07T07:02:47.2636192Z #31 0.457 >>> cleaning requirements/cuda.txt 2025-09-07T07:02:47.2636550Z #31 0.457 removed: 2025-09-07T07:02:47.2636803Z #31 0.457 torch==2.8.0 2025-09-07T07:02:47.2637068Z #31 0.457 torchaudio==2.8.0 2025-09-07T07:02:47.2637520Z #31 0.457 # These must be updated alongside torch 2025-09-07T07:02:47.2638371Z #31 0.457 torchvision==0.23.0 # Required for phi3v processor. See https://github.com/pytorch/vision?tab=readme-ov-file#installation for corresponding version 2025-09-07T07:02:47.2639491Z #31 0.457 xformers==0.0.32.post1; platform_system == 'Linux' and platform_machine == 'x86_64' # Requires PyTorch >= 2.8 2025-09-07T07:02:47.2640141Z #31 0.457 <<< done cleaning requirements/cuda.txt 2025-09-07T07:02:47.2640505Z #31 0.457 2025-09-07T07:02:47.2640772Z #31 0.457 >>> cleaning requirements/dev.txt 2025-09-07T07:02:47.2641146Z #31 0.457 <<< done cleaning requirements/dev.txt 2025-09-07T07:02:47.2641492Z #31 0.457 2025-09-07T07:02:47.2641742Z #31 0.457 >>> cleaning requirements/docs.txt 2025-09-07T07:02:47.2642145Z #31 0.457 removed: 2025-09-07T07:02:47.2642445Z #31 0.457 -f https://download.pytorch.org/whl/cpu 2025-09-07T07:02:47.2642815Z #31 0.457 torch 2025-09-07T07:02:47.2643102Z #31 0.457 <<< done cleaning requirements/docs.txt 2025-09-07T07:02:47.2643609Z #31 0.457 2025-09-07T07:02:47.2643885Z #31 0.457 >>> cleaning requirements/kv_connectors.txt 2025-09-07T07:02:47.2644374Z #31 0.457 <<< done cleaning requirements/kv_connectors.txt 2025-09-07T07:02:47.2644836Z #31 0.457 2025-09-07T07:02:47.2645091Z #31 0.457 >>> cleaning requirements/lint.txt 2025-09-07T07:02:47.2645482Z #31 0.457 <<< done cleaning requirements/lint.txt 2025-09-07T07:02:47.2645823Z #31 0.457 2025-09-07T07:02:47.2646128Z #31 0.457 >>> cleaning requirements/nightly_torch_test.txt 2025-09-07T07:02:47.2646588Z #31 0.457 <<< done cleaning requirements/nightly_torch_test.txt 2025-09-07T07:02:47.2647011Z #31 0.457 2025-09-07T07:02:47.2647275Z #31 0.457 >>> cleaning requirements/rocm-build.txt 2025-09-07T07:02:47.2647642Z #31 0.457 removed: 2025-09-07T07:02:47.2648010Z #31 0.457 --extra-index-url https://download.pytorch.org/whl/rocm6.3 2025-09-07T07:02:47.2648464Z #31 0.457 torch==2.8.0 2025-09-07T07:02:47.2649213Z #31 0.457 torchvision==0.23.0 2025-09-07T07:02:47.2649699Z #31 0.457 torchaudio==2.8.0 2025-09-07T07:02:47.2650069Z #31 0.457 <<< done cleaning requirements/rocm-build.txt 2025-09-07T07:02:47.2650450Z #31 0.457 2025-09-07T07:02:47.2650748Z #31 0.457 >>> cleaning requirements/rocm-test.txt 2025-09-07T07:02:47.2651265Z #31 0.457 <<< done cleaning requirements/rocm-test.txt 2025-09-07T07:02:47.2651659Z #31 0.457 2025-09-07T07:02:47.2651926Z #31 0.457 >>> cleaning requirements/rocm.txt 2025-09-07T07:02:47.2652343Z #31 0.457 <<< done cleaning requirements/rocm.txt 2025-09-07T07:02:47.2652716Z #31 0.457 2025-09-07T07:02:47.2652978Z #31 0.457 >>> cleaning requirements/test.txt 2025-09-07T07:02:47.2653336Z #31 0.457 removed: 2025-09-07T07:02:47.2654005Z #31 0.457 # uv pip compile requirements/test.in -o requirements/test.txt --index-strategy unsafe-best-match --torch-backend cu128 2025-09-07T07:02:47.2654756Z #31 0.457 # via terratorch 2025-09-07T07:02:47.2655042Z #31 0.457 # via terratorch 2025-09-07T07:02:47.2655360Z #31 0.457 efficientnet-pytorch==0.7.1 2025-09-07T07:02:47.2655741Z #31 0.457 # via segmentation-models-pytorch 2025-09-07T07:02:47.2656112Z #31 0.457 # terratorch 2025-09-07T07:02:47.2656398Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2656684Z #31 0.457 # vector-quantize-pytorch 2025-09-07T07:02:47.2657054Z #31 0.457 # via vector-quantize-pytorch 2025-09-07T07:02:47.2657387Z #31 0.457 # torch 2025-09-07T07:02:47.2657652Z #31 0.457 # via torchgeo 2025-09-07T07:02:47.2658031Z #31 0.457 # pytorch-lightning 2025-09-07T07:02:47.2658346Z #31 0.457 # torch 2025-09-07T07:02:47.2658604Z #31 0.457 # via open-clip-torch 2025-09-07T07:02:47.2658928Z #31 0.457 # via terratorch 2025-09-07T07:02:47.2659211Z #31 0.457 # via terratorch 2025-09-07T07:02:47.2659519Z #31 0.457 # open-clip-torch 2025-09-07T07:02:47.2659853Z #31 0.457 # segmentation-models-pytorch 2025-09-07T07:02:47.2660216Z #31 0.457 # terratorch 2025-09-07T07:02:47.2660492Z #31 0.457 # torch 2025-09-07T07:02:47.2660762Z #31 0.457 # terratorch 2025-09-07T07:02:47.2661043Z #31 0.457 # via torchgeo 2025-09-07T07:02:47.2661309Z #31 0.457 # terratorch 2025-09-07T07:02:47.2661586Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2661848Z #31 0.457 # terratorch 2025-09-07T07:02:47.2662128Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2662404Z #31 0.457 # pytorch-lightning 2025-09-07T07:02:47.2662848Z #31 0.457 # torchmetrics 2025-09-07T07:02:47.2663141Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2663413Z #31 0.457 # via terratorch 2025-09-07T07:02:47.2663693Z #31 0.457 # torch 2025-09-07T07:02:47.2663963Z #31 0.457 # segmentation-models-pytorch 2025-09-07T07:02:47.2664311Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2664564Z #31 0.457 # torchmetrics 2025-09-07T07:02:47.2664850Z #31 0.457 # torchvision 2025-09-07T07:02:47.2665110Z #31 0.457 # torch 2025-09-07T07:02:47.2665415Z #31 0.457 # via torch 2025-09-07T07:02:47.2665664Z #31 0.457 # via torch 2025-09-07T07:02:47.2665924Z #31 0.457 # via torch 2025-09-07T07:02:47.2666169Z #31 0.457 # via torch 2025-09-07T07:02:47.2666429Z #31 0.457 # via torch 2025-09-07T07:02:47.2666673Z #31 0.457 # via torch 2025-09-07T07:02:47.2666932Z #31 0.457 # via torch 2025-09-07T07:02:47.2667240Z #31 0.457 # via torch 2025-09-07T07:02:47.2667533Z #31 0.457 # torch 2025-09-07T07:02:47.2667785Z #31 0.457 # via torch 2025-09-07T07:02:47.2668027Z #31 0.457 # via torch 2025-09-07T07:02:47.2668281Z #31 0.457 # torch 2025-09-07T07:02:47.2668513Z #31 0.457 # via torch 2025-09-07T07:02:47.2668797Z #31 0.457 open-clip-torch==2.32.0 2025-09-07T07:02:47.2669115Z #31 0.457 # pytorch-lightning 2025-09-07T07:02:47.2669424Z #31 0.457 # torchmetrics 2025-09-07T07:02:47.2669691Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2670066Z #31 0.457 # segmentation-models-pytorch 2025-09-07T07:02:47.2670400Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2670669Z #31 0.457 # torchvision 2025-09-07T07:02:47.2670988Z #31 0.457 # via segmentation-models-pytorch 2025-09-07T07:02:47.2671335Z #31 0.457 # via terratorch 2025-09-07T07:02:47.2671617Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2671866Z #31 0.457 # terratorch 2025-09-07T07:02:47.2672146Z #31 0.457 # via terratorch 2025-09-07T07:02:47.2672439Z #31 0.457 pytorch-lightning==2.5.2 2025-09-07T07:02:47.2672790Z #31 0.457 # pytorch-lightning 2025-09-07T07:02:47.2673086Z #31 0.457 # terratorch 2025-09-07T07:02:47.2673359Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2673618Z #31 0.457 # open-clip-torch 2025-09-07T07:02:47.2673927Z #31 0.457 # via terratorch 2025-09-07T07:02:47.2674211Z #31 0.457 # via torchgeo 2025-09-07T07:02:47.2674483Z #31 0.457 # open-clip-torch 2025-09-07T07:02:47.2674828Z #31 0.457 segmentation-models-pytorch==0.4.0 2025-09-07T07:02:47.2675181Z #31 0.457 # terratorch 2025-09-07T07:02:47.2675452Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2675697Z #31 0.457 # torch 2025-09-07T07:02:47.2675946Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2676230Z #31 0.457 # segmentation-models-pytorch 2025-09-07T07:02:47.2676574Z #31 0.457 # torch 2025-09-07T07:02:47.2676824Z #31 0.457 terratorch==1.1rc3 2025-09-07T07:02:47.2677126Z #31 0.457 # terratorch 2025-09-07T07:02:47.2677416Z #31 0.457 # open-clip-torch 2025-09-07T07:02:47.2677742Z #31 0.457 # segmentation-models-pytorch 2025-09-07T07:02:47.2678101Z #31 0.457 # terratorch 2025-09-07T07:02:47.2678361Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2678638Z #31 0.457 torch==2.8.0+cu128 2025-09-07T07:02:47.2678936Z #31 0.457 # efficientnet-pytorch 2025-09-07T07:02:47.2679321Z #31 0.457 # open-clip-torch 2025-09-07T07:02:47.2679626Z #31 0.457 # pytorch-lightning 2025-09-07T07:02:47.2679974Z #31 0.457 # segmentation-models-pytorch 2025-09-07T07:02:47.2680316Z #31 0.457 # terratorch 2025-09-07T07:02:47.2680602Z #31 0.457 # torchaudio 2025-09-07T07:02:47.2680885Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2681144Z #31 0.457 # torchmetrics 2025-09-07T07:02:47.2681436Z #31 0.457 # torchvision 2025-09-07T07:02:47.2681736Z #31 0.457 # vector-quantize-pytorch 2025-09-07T07:02:47.2682090Z #31 0.457 torchaudio==2.8.0+cu128 2025-09-07T07:02:47.2682391Z #31 0.457 torchgeo==0.7.0 2025-09-07T07:02:47.2682681Z #31 0.457 # via terratorch 2025-09-07T07:02:47.2682967Z #31 0.457 torchmetrics==1.7.4 2025-09-07T07:02:47.2683283Z #31 0.457 # pytorch-lightning 2025-09-07T07:02:47.2683577Z #31 0.457 # terratorch 2025-09-07T07:02:47.2683847Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2684115Z #31 0.457 torchvision==0.23.0+cu128 2025-09-07T07:02:47.2684450Z #31 0.457 # open-clip-torch 2025-09-07T07:02:47.2684789Z #31 0.457 # segmentation-models-pytorch 2025-09-07T07:02:47.2685125Z #31 0.457 # terratorch 2025-09-07T07:02:47.2685396Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2685657Z #31 0.457 # open-clip-torch 2025-09-07T07:02:47.2685969Z #31 0.457 # pytorch-lightning 2025-09-07T07:02:47.2686293Z #31 0.457 # segmentation-models-pytorch 2025-09-07T07:02:47.2686690Z #31 0.457 # via torch 2025-09-07T07:02:47.2686958Z #31 0.457 # pytorch-lightning 2025-09-07T07:02:47.2687259Z #31 0.457 # torch 2025-09-07T07:02:47.2687503Z #31 0.457 # torchgeo 2025-09-07T07:02:47.2687806Z #31 0.457 vector-quantize-pytorch==1.21.2 2025-09-07T07:02:47.2688209Z #31 0.457 <<< done cleaning requirements/test.txt 2025-09-07T07:02:47.2688632Z #31 0.457 2025-09-07T07:02:47.2688901Z #31 0.457 >>> cleaning requirements/tpu.txt 2025-09-07T07:02:47.2689235Z #31 0.457 removed: 2025-09-07T07:02:47.2689495Z #31 0.457 # Install torch_xla 2025-09-07T07:02:47.2689945Z #31 0.457 --extra-index-url https://download.pytorch.org/whl/nightly/cpu 2025-09-07T07:02:47.2690442Z #31 0.457 torch==2.9.0.dev20250730 2025-09-07T07:02:47.2690762Z #31 0.457 torchvision==0.24.0.dev20250730 2025-09-07T07:02:47.2692099Z #31 0.457 torch_xla[tpu, pallas] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.9.0.dev20250730-cp311-cp311-linux_x86_64.whl ; python_version == "3.11" 2025-09-07T07:02:47.2693806Z #31 0.457 torch_xla[tpu, pallas] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.9.0.dev20250730-cp312-cp312-linux_x86_64.whl ; python_version == "3.12" 2025-09-07T07:02:47.2694860Z #31 0.457 <<< done cleaning requirements/tpu.txt 2025-09-07T07:02:47.2695239Z #31 0.457 2025-09-07T07:02:47.2695504Z #31 0.457 >>> cleaning requirements/xpu.txt 2025-09-07T07:02:47.2695868Z #31 0.457 removed: 2025-09-07T07:02:47.2696239Z #31 0.457 --extra-index-url=https://download.pytorch.org/whl/xpu 2025-09-07T07:02:47.2696701Z #31 0.457 torch==2.8.0+xpu 2025-09-07T07:02:47.2696999Z #31 0.457 torchaudio 2025-09-07T07:02:47.2697256Z #31 0.457 torchvision 2025-09-07T07:02:47.2697547Z #31 0.457 pytorch-triton-xpu 2025-09-07T07:02:47.2698092Z #31 0.457 --extra-index-url=https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ 2025-09-07T07:02:47.2698731Z #31 0.457 intel-extension-for-pytorch==2.8.10+xpu 2025-09-07T07:02:47.2699154Z #31 0.457 <<< done cleaning requirements/xpu.txt 2025-09-07T07:02:47.2699524Z #31 0.457 2025-09-07T07:02:47.2699772Z #31 0.457 >>> cleaning pyproject.toml 2025-09-07T07:02:47.2700110Z #31 0.457 removed: 2025-09-07T07:02:47.2700373Z #31 0.457 "torch == 2.8.0", 2025-09-07T07:02:47.2700687Z #31 0.457 <<< done cleaning pyproject.toml 2025-09-07T07:02:47.2701041Z #31 0.457 2025-09-07T07:02:47.4308439Z #31 DONE 0.5s 2025-09-07T07:02:47.4308979Z 2025-09-07T07:02:47.4310080Z #32 [build 3/7] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system -r requirements/build.txt 2025-09-07T07:02:48.0274386Z #32 0.747 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T07:02:48.2265343Z #32 0.794 Resolved 11 packages in 37ms 2025-09-07T07:02:48.2265785Z #32 0.796 Downloading cmake (28.3MiB) 2025-09-07T07:02:48.5793495Z #32 1.299 Downloading cmake 2025-09-07T07:02:48.7297414Z #32 1.300 Prepared 2 packages in 505ms 2025-09-07T07:02:48.8254891Z #32 1.546 Installed 2 packages in 245ms 2025-09-07T07:02:48.8255518Z #32 1.546 + cmake==4.1.0 2025-09-07T07:02:48.8255832Z #32 1.546 + setuptools-scm==9.2.0 2025-09-07T07:02:49.3137058Z #32 DONE 2.0s 2025-09-07T07:02:49.4665636Z 2025-09-07T07:02:49.4666502Z #33 [build 4/7] RUN --mount=type=bind,source=.git,target=.git if [ "0" != "0" ]; then bash tools/check_repo.sh ; fi 2025-09-07T07:02:49.9604376Z #33 DONE 0.6s 2025-09-07T07:02:50.1129444Z 2025-09-07T07:02:50.1136174Z #34 [build 5/7] RUN --mount=type=cache,target=/root/.cache/uv --mount=type=bind,source=.git,target=.git if [ "1" = "1" ]; then echo "Installing sccache..." && curl -L -o sccache.tar.gz https://github.com/mozilla/sccache/releases/download/v0.8.1/sccache-v0.8.1-x86_64-unknown-linux-musl.tar.gz && tar -xzf sccache.tar.gz && sudo mv sccache-v0.8.1-x86_64-unknown-linux-musl/sccache /usr/bin/sccache && rm -rf sccache.tar.gz sccache-v0.8.1-x86_64-unknown-linux-musl && export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 && export SCCACHE_REGION=us-east-1 && export SCCACHE_S3_NO_CREDENTIALS=0 && export SCCACHE_IDLE_TIMEOUT=0 && export CMAKE_BUILD_TYPE=Release && export VLLM_DOCKER_BUILD_CONTEXT=1 && sccache --show-stats && python3 setup.py bdist_wheel --dist-dir=vllm-dist --py-limited-api=cp38 && sccache --show-stats; fi 2025-09-07T07:02:51.0050194Z #34 1.043 Installing sccache... 2025-09-07T07:02:51.1178141Z #34 1.049 % Total % Received % Xferd Average Speed Time Time Time Current 2025-09-07T07:02:51.1178868Z #34 1.049 Dload Upload Total Spent Left Speed 2025-09-07T07:02:51.1179312Z #34 1.049 2025-09-07T07:02:51.1179638Z 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 2025-09-07T07:02:51.1180073Z 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 2025-09-07T07:02:51.1180457Z #34 1.156 2025-09-07T07:02:51.1180796Z 100 9113k 100 9113k 0 0 83.3M 0 --:--:-- --:--:-- --:--:-- 83.3M 2025-09-07T07:02:51.3680870Z #34 1.406 Compile requests 0 2025-09-07T07:02:51.3681377Z #34 1.406 Compile requests executed 0 2025-09-07T07:02:51.3681789Z #34 1.406 Cache hits 0 2025-09-07T07:02:51.3682173Z #34 1.406 Cache misses 0 2025-09-07T07:02:51.3682785Z #34 1.406 Cache timeouts 0 2025-09-07T07:02:51.3683186Z #34 1.406 Cache read errors 0 2025-09-07T07:02:51.3683562Z #34 1.406 Forced recaches 0 2025-09-07T07:02:51.3683957Z #34 1.406 Cache write errors 0 2025-09-07T07:02:51.3684340Z #34 1.406 Compilation failures 0 2025-09-07T07:02:51.3684734Z #34 1.406 Cache errors 0 2025-09-07T07:02:51.3685122Z #34 1.406 Non-cacheable compilations 0 2025-09-07T07:02:51.3685528Z #34 1.406 Non-cacheable calls 0 2025-09-07T07:02:51.3685916Z #34 1.406 Non-compilation calls 0 2025-09-07T07:02:51.3686326Z #34 1.406 Unsupported compiler calls 0 2025-09-07T07:02:51.3686737Z #34 1.406 Average cache write 0.000 s 2025-09-07T07:02:51.3687134Z #34 1.406 Average compiler 0.000 s 2025-09-07T07:02:51.3687542Z #34 1.406 Average cache read hit 0.000 s 2025-09-07T07:02:51.3687951Z #34 1.406 Failed distributed compilations 0 2025-09-07T07:02:51.3688512Z #34 1.406 Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-09-07T07:02:51.3689043Z #34 1.406 Version (client) 0.8.1 2025-09-07T07:02:53.6054003Z #34 3.643 W0907 07:02:53.604000 70 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/torch/utils/cpp_extension.py:119] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 2025-09-07T07:02:53.8678455Z #34 3.906 /opt/python/cp312-cp312/lib/python3.12/site-packages/setuptools_scm/_integration/version_inference.py:51: UserWarning: version of None already set 2025-09-07T07:02:53.8679463Z #34 3.906 warnings.warn(self.message) 2025-09-07T07:02:54.1102920Z #34 4.148 running bdist_wheel 2025-09-07T07:02:54.2105449Z #34 4.194 running build 2025-09-07T07:02:54.2105842Z #34 4.194 running build_py 2025-09-07T07:02:54.2106233Z #34 4.206 creating build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2106841Z #34 4.206 copying vllm/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2107487Z #34 4.206 copying vllm/_custom_ops.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2108139Z #34 4.207 copying vllm/_ipex_ops.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2108767Z #34 4.207 copying vllm/beam_search.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2109424Z #34 4.207 copying vllm/collect_env.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2110075Z #34 4.207 copying vllm/connections.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2110908Z #34 4.207 copying vllm/env_override.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2111550Z #34 4.208 copying vllm/envs.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2112187Z #34 4.208 copying vllm/forward_context.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2112854Z #34 4.208 copying vllm/logger.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2113634Z #34 4.208 copying vllm/logits_process.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2114305Z #34 4.208 copying vllm/logprobs.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2114945Z #34 4.209 copying vllm/outputs.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2115593Z #34 4.209 copying vllm/pooling_params.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2116288Z #34 4.209 copying vllm/sampling_params.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2116955Z #34 4.209 copying vllm/scalar_type.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2117598Z #34 4.210 copying vllm/scripts.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2118217Z #34 4.210 copying vllm/sequence.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2118834Z #34 4.210 copying vllm/tasks.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2119457Z #34 4.210 copying vllm/test_utils.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2120080Z #34 4.210 copying vllm/tracing.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2120715Z #34 4.211 copying vllm/version.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2121329Z #34 4.211 copying vllm/_version.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.2121949Z #34 4.211 creating build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T07:02:54.2122695Z #34 4.211 copying vllm/adapter_commons/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T07:02:54.2123600Z #34 4.211 copying vllm/adapter_commons/layers.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T07:02:54.2124501Z #34 4.212 copying vllm/adapter_commons/models.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T07:02:54.2125393Z #34 4.212 copying vllm/adapter_commons/request.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T07:02:54.2126298Z #34 4.212 copying vllm/adapter_commons/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T07:02:54.2127284Z #34 4.212 copying vllm/adapter_commons/worker_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T07:02:54.2128053Z #34 4.213 creating build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T07:02:54.2128685Z #34 4.213 copying vllm/assets/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T07:02:54.2129408Z #34 4.213 copying vllm/assets/audio.py -> build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T07:02:54.2130136Z #34 4.213 copying vllm/assets/base.py -> build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T07:02:54.2130957Z #34 4.213 copying vllm/assets/image.py -> build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T07:02:54.2131890Z #34 4.214 copying vllm/assets/video.py -> build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T07:02:54.2132566Z #34 4.214 creating build/lib.linux-x86_64-cpython-312/vllm/attention 2025-09-07T07:02:54.2133245Z #34 4.214 copying vllm/attention/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention 2025-09-07T07:02:54.2134060Z #34 4.214 copying vllm/attention/layer.py -> build/lib.linux-x86_64-cpython-312/vllm/attention 2025-09-07T07:02:54.2134875Z #34 4.215 copying vllm/attention/selector.py -> build/lib.linux-x86_64-cpython-312/vllm/attention 2025-09-07T07:02:54.2135591Z #34 4.215 creating build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T07:02:54.2136357Z #34 4.215 copying vllm/benchmarks/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T07:02:54.2137204Z #34 4.215 copying vllm/benchmarks/datasets.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T07:02:54.2138072Z #34 4.215 copying vllm/benchmarks/latency.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T07:02:54.2138903Z #34 4.216 copying vllm/benchmarks/serve.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T07:02:54.2139854Z #34 4.216 copying vllm/benchmarks/throughput.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T07:02:54.2140594Z #34 4.216 creating build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2141325Z #34 4.216 copying vllm/compilation/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2142278Z #34 4.217 copying vllm/compilation/activation_quant_fusion.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2143348Z #34 4.217 copying vllm/compilation/backends.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2144253Z #34 4.217 copying vllm/compilation/base_static_graph.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2145187Z #34 4.217 copying vllm/compilation/collective_fusion.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2146150Z #34 4.218 copying vllm/compilation/compiler_interface.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2147062Z #34 4.218 copying vllm/compilation/counter.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2147914Z #34 4.218 copying vllm/compilation/cuda_graph.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2149028Z #34 4.218 copying vllm/compilation/cuda_piecewise_backend.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2150166Z #34 4.218 copying vllm/compilation/decorators.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2151150Z #34 4.219 copying vllm/compilation/fix_functionalization.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2152111Z #34 4.219 copying vllm/compilation/fusion.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2152985Z #34 4.219 copying vllm/compilation/fusion_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2153879Z #34 4.219 copying vllm/compilation/fx_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2154836Z #34 4.219 copying vllm/compilation/inductor_pass.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2155746Z #34 4.220 copying vllm/compilation/monitor.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2156675Z #34 4.220 copying vllm/compilation/multi_output_match.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2157640Z #34 4.220 copying vllm/compilation/noop_elimination.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2158596Z #34 4.220 copying vllm/compilation/pass_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2159558Z #34 4.221 copying vllm/compilation/sequence_parallelism.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2160600Z #34 4.221 copying vllm/compilation/torch25_custom_graph_pass.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2161619Z #34 4.221 copying vllm/compilation/vllm_inductor_pass.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2162627Z #34 4.221 copying vllm/compilation/wrapper.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T07:02:54.2163326Z #34 4.222 creating build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T07:02:54.2163942Z #34 4.222 copying vllm/config/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T07:02:54.2164717Z #34 4.222 copying vllm/config/cache.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T07:02:54.2165489Z #34 4.222 copying vllm/config/compilation.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T07:02:54.2166260Z #34 4.222 copying vllm/config/parallel.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T07:02:54.2167030Z #34 4.223 copying vllm/config/scheduler.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T07:02:54.2167850Z #34 4.223 copying vllm/config/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T07:02:54.2168475Z #34 4.223 creating build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T07:02:54.2169075Z #34 4.223 copying vllm/core/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T07:02:54.2169785Z #34 4.224 copying vllm/core/block_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T07:02:54.2170520Z #34 4.224 copying vllm/core/evictor.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T07:02:54.2171490Z #34 4.224 copying vllm/core/interfaces.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T07:02:54.2172370Z #34 4.224 copying vllm/core/placeholder_block_space_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T07:02:54.2173235Z #34 4.224 copying vllm/core/scheduler.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T07:02:54.2173929Z #34 4.225 creating build/lib.linux-x86_64-cpython-312/vllm/device_allocator 2025-09-07T07:02:54.2174758Z #34 4.225 copying vllm/device_allocator/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/device_allocator 2025-09-07T07:02:54.2175702Z #34 4.225 copying vllm/device_allocator/cumem.py -> build/lib.linux-x86_64-cpython-312/vllm/device_allocator 2025-09-07T07:02:54.2176461Z #34 4.225 creating build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T07:02:54.2177185Z #34 4.226 copying vllm/distributed/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T07:02:54.2178094Z #34 4.226 copying vllm/distributed/communication_op.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T07:02:54.2179027Z #34 4.226 copying vllm/distributed/kv_events.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T07:02:54.2179942Z #34 4.226 copying vllm/distributed/parallel_state.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T07:02:54.2180915Z #34 4.226 copying vllm/distributed/tpu_distributed_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T07:02:54.2181853Z #34 4.227 copying vllm/distributed/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T07:02:54.2182602Z #34 4.227 creating build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T07:02:54.2183362Z #34 4.227 copying vllm/engine/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T07:02:54.2184107Z #34 4.227 copying vllm/engine/arg_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T07:02:54.2184889Z #34 4.227 copying vllm/engine/async_llm_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T07:02:54.2185701Z #34 4.228 copying vllm/engine/async_timeout.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T07:02:54.2186474Z #34 4.228 copying vllm/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T07:02:54.2187235Z #34 4.228 copying vllm/engine/metrics.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T07:02:54.2188006Z #34 4.228 copying vllm/engine/metrics_types.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T07:02:54.2188793Z #34 4.229 copying vllm/engine/protocol.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T07:02:54.2189471Z #34 4.229 creating build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2190161Z #34 4.229 copying vllm/entrypoints/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2191008Z #34 4.229 copying vllm/entrypoints/api_server.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2191896Z #34 4.230 copying vllm/entrypoints/chat_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2192770Z #34 4.230 copying vllm/entrypoints/constants.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2193636Z #34 4.230 copying vllm/entrypoints/context.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2194560Z #34 4.230 copying vllm/entrypoints/harmony_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2195444Z #34 4.230 copying vllm/entrypoints/launcher.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2196264Z #34 4.231 copying vllm/entrypoints/llm.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2197082Z #34 4.231 copying vllm/entrypoints/logger.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2197935Z #34 4.231 copying vllm/entrypoints/renderer.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2198793Z #34 4.231 copying vllm/entrypoints/score_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2199626Z #34 4.231 copying vllm/entrypoints/ssl.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2200417Z #34 4.232 copying vllm/entrypoints/tool.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2201261Z #34 4.232 copying vllm/entrypoints/tool_server.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2202110Z #34 4.232 copying vllm/entrypoints/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T07:02:54.2202788Z #34 4.232 creating build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T07:02:54.2203446Z #34 4.233 copying vllm/executor/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T07:02:54.2204241Z #34 4.233 copying vllm/executor/executor_base.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T07:02:54.2205137Z #34 4.233 copying vllm/executor/mp_distributed_executor.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T07:02:54.2206026Z #34 4.233 copying vllm/executor/msgspec_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T07:02:54.2206906Z #34 4.233 copying vllm/executor/multiproc_worker_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T07:02:54.2207855Z #34 4.234 copying vllm/executor/ray_distributed_executor.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T07:02:54.2208742Z #34 4.234 copying vllm/executor/ray_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T07:02:54.2209577Z #34 4.234 copying vllm/executor/uniproc_executor.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T07:02:54.2210288Z #34 4.234 creating build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T07:02:54.2210979Z #34 4.235 copying vllm/inputs/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T07:02:54.2211896Z #34 4.235 copying vllm/inputs/data.py -> build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T07:02:54.2212630Z #34 4.235 copying vllm/inputs/parse.py -> build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T07:02:54.2213416Z #34 4.235 copying vllm/inputs/preprocess.py -> build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T07:02:54.2214212Z #34 4.235 copying vllm/inputs/registry.py -> build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T07:02:54.2214918Z #34 4.236 creating build/lib.linux-x86_64-cpython-312/vllm/logging_utils 2025-09-07T07:02:54.2215662Z #34 4.236 copying vllm/logging_utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/logging_utils 2025-09-07T07:02:54.2216543Z #34 4.236 copying vllm/logging_utils/dump_input.py -> build/lib.linux-x86_64-cpython-312/vllm/logging_utils 2025-09-07T07:02:54.2217461Z #34 4.236 copying vllm/logging_utils/formatter.py -> build/lib.linux-x86_64-cpython-312/vllm/logging_utils 2025-09-07T07:02:54.2218212Z #34 4.237 creating build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T07:02:54.2218832Z #34 4.237 copying vllm/lora/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T07:02:54.2219612Z #34 4.237 copying vllm/lora/fully_sharded_layers.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T07:02:54.2220420Z #34 4.237 copying vllm/lora/layers.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T07:02:54.2221164Z #34 4.237 copying vllm/lora/lora.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T07:02:54.2221857Z #34 4.238 copying vllm/lora/models.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T07:02:54.2222600Z #34 4.238 copying vllm/lora/peft_helper.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T07:02:54.2223455Z #34 4.238 copying vllm/lora/request.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T07:02:54.2224161Z #34 4.238 copying vllm/lora/resolver.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T07:02:54.2224869Z #34 4.238 copying vllm/lora/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T07:02:54.2225589Z #34 4.239 copying vllm/lora/worker_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T07:02:54.2226282Z #34 4.239 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T07:02:54.2227010Z #34 4.239 copying vllm/model_executor/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T07:02:54.2227894Z #34 4.239 copying vllm/model_executor/custom_op.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T07:02:54.2228801Z #34 4.239 copying vllm/model_executor/parameter.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T07:02:54.2229735Z #34 4.240 copying vllm/model_executor/sampling_metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T07:02:54.2230657Z #34 4.240 copying vllm/model_executor/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T07:02:54.2231361Z #34 4.240 creating build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2232045Z #34 4.240 copying vllm/multimodal/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2232851Z #34 4.241 copying vllm/multimodal/audio.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2233637Z #34 4.241 copying vllm/multimodal/base.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2234435Z #34 4.241 copying vllm/multimodal/cache.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2235264Z #34 4.241 copying vllm/multimodal/hasher.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2236072Z #34 4.241 copying vllm/multimodal/image.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2236878Z #34 4.242 copying vllm/multimodal/inputs.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2237674Z #34 4.242 copying vllm/multimodal/parse.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2238526Z #34 4.242 copying vllm/multimodal/processing.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2239390Z #34 4.242 copying vllm/multimodal/profiling.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2240235Z #34 4.243 copying vllm/multimodal/registry.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2241060Z #34 4.243 copying vllm/multimodal/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2241849Z #34 4.243 copying vllm/multimodal/video.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T07:02:54.2242531Z #34 4.243 creating build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T07:02:54.2243206Z #34 4.243 copying vllm/platforms/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T07:02:54.2244002Z #34 4.244 copying vllm/platforms/cpu.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T07:02:54.2244774Z #34 4.244 copying vllm/platforms/cuda.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T07:02:54.2245570Z #34 4.244 copying vllm/platforms/interface.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T07:02:54.2246377Z #34 4.244 copying vllm/platforms/rocm.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T07:02:54.2247209Z #34 4.245 copying vllm/platforms/tpu.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T07:02:54.2247957Z #34 4.245 copying vllm/platforms/xpu.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T07:02:54.2248618Z #34 4.245 creating build/lib.linux-x86_64-cpython-312/vllm/plugins 2025-09-07T07:02:54.2249604Z #34 4.245 copying vllm/plugins/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/plugins 2025-09-07T07:02:54.2250286Z #34 4.245 creating build/lib.linux-x86_64-cpython-312/vllm/profiler 2025-09-07T07:02:54.2251025Z #34 4.246 copying vllm/profiler/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/profiler 2025-09-07T07:02:54.2251893Z #34 4.246 copying vllm/profiler/layerwise_profile.py -> build/lib.linux-x86_64-cpython-312/vllm/profiler 2025-09-07T07:02:54.2252752Z #34 4.246 copying vllm/profiler/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/profiler 2025-09-07T07:02:54.2253403Z #34 4.246 creating build/lib.linux-x86_64-cpython-312/vllm/ray 2025-09-07T07:02:54.2254006Z #34 4.246 copying vllm/ray/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/ray 2025-09-07T07:02:54.2254704Z #34 4.247 copying vllm/ray/lazy_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/ray 2025-09-07T07:02:54.2255414Z #34 4.247 copying vllm/ray/ray_env.py -> build/lib.linux-x86_64-cpython-312/vllm/ray 2025-09-07T07:02:54.2256056Z #34 4.247 creating build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T07:02:54.2256732Z #34 4.247 copying vllm/reasoning/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T07:02:54.2257623Z #34 4.247 copying vllm/reasoning/abs_reasoning_parsers.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T07:02:54.2258619Z #34 4.248 copying vllm/reasoning/deepseek_r1_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T07:02:54.2259636Z #34 4.248 copying vllm/reasoning/glm4_moe_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T07:02:54.2260637Z #34 4.248 copying vllm/reasoning/gptoss_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T07:02:54.2261694Z #34 4.248 copying vllm/reasoning/granite_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T07:02:54.2262838Z #34 4.248 copying vllm/reasoning/hunyuan_a13b_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T07:02:54.2263822Z #34 4.249 copying vllm/reasoning/mistral_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T07:02:54.3107456Z #34 4.249 copying vllm/reasoning/qwen3_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T07:02:54.3108511Z #34 4.249 copying vllm/reasoning/step3_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T07:02:54.3109281Z #34 4.249 creating build/lib.linux-x86_64-cpython-312/vllm/third_party 2025-09-07T07:02:54.3109998Z #34 4.249 copying vllm/third_party/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/third_party 2025-09-07T07:02:54.3110812Z #34 4.250 copying vllm/third_party/pynvml.py -> build/lib.linux-x86_64-cpython-312/vllm/third_party 2025-09-07T07:02:54.3111557Z #34 4.250 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3112374Z #34 4.250 copying vllm/transformers_utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3113332Z #34 4.251 copying vllm/transformers_utils/config.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3114486Z #34 4.251 copying vllm/transformers_utils/detokenizer.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3115545Z #34 4.251 copying vllm/transformers_utils/detokenizer_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3116614Z #34 4.251 copying vllm/transformers_utils/dynamic_module.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3117779Z #34 4.251 copying vllm/transformers_utils/processor.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3118759Z #34 4.252 copying vllm/transformers_utils/s3_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3119753Z #34 4.252 copying vllm/transformers_utils/tokenizer.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3120764Z #34 4.252 copying vllm/transformers_utils/tokenizer_base.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3121818Z #34 4.252 copying vllm/transformers_utils/tokenizer_group.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3122825Z #34 4.252 copying vllm/transformers_utils/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T07:02:54.3123589Z #34 4.253 creating build/lib.linux-x86_64-cpython-312/vllm/triton_utils 2025-09-07T07:02:54.3124303Z #34 4.253 copying vllm/triton_utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/triton_utils 2025-09-07T07:02:54.3125143Z #34 4.253 copying vllm/triton_utils/importing.py -> build/lib.linux-x86_64-cpython-312/vllm/triton_utils 2025-09-07T07:02:54.3125844Z #34 4.254 creating build/lib.linux-x86_64-cpython-312/vllm/usage 2025-09-07T07:02:54.3126457Z #34 4.254 copying vllm/usage/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/usage 2025-09-07T07:02:54.3127164Z #34 4.254 copying vllm/usage/usage_lib.py -> build/lib.linux-x86_64-cpython-312/vllm/usage 2025-09-07T07:02:54.3127795Z #34 4.254 creating build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T07:02:54.3128389Z #34 4.254 copying vllm/utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T07:02:54.3129106Z #34 4.255 copying vllm/utils/deep_gemm.py -> build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T07:02:54.3129856Z #34 4.255 copying vllm/utils/flashinfer.py -> build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T07:02:54.3130604Z #34 4.255 copying vllm/utils/jsontree.py -> build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T07:02:54.3131751Z #34 4.255 copying vllm/utils/tensor_schema.py -> build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T07:02:54.3132402Z #34 4.256 creating build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T07:02:54.3132987Z #34 4.256 copying vllm/v1/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T07:02:54.3133731Z #34 4.256 copying vllm/v1/cudagraph_dispatcher.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T07:02:54.3134541Z #34 4.256 copying vllm/v1/kv_cache_interface.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T07:02:54.3135279Z #34 4.256 copying vllm/v1/outputs.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T07:02:54.3135960Z #34 4.257 copying vllm/v1/request.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T07:02:54.3136669Z #34 4.257 copying vllm/v1/serial_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T07:02:54.3137363Z #34 4.257 copying vllm/v1/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T07:02:54.3137975Z #34 4.257 creating build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T07:02:54.3138624Z #34 4.257 copying vllm/worker/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T07:02:54.3139395Z #34 4.258 copying vllm/worker/cache_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T07:02:54.3140246Z #34 4.258 copying vllm/worker/enc_dec_model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T07:02:54.3141114Z #34 4.258 copying vllm/worker/model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T07:02:54.3141952Z #34 4.258 copying vllm/worker/model_runner_base.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T07:02:54.3142754Z #34 4.259 copying vllm/worker/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T07:02:54.3143531Z #34 4.259 copying vllm/worker/worker.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T07:02:54.3154516Z #34 4.259 copying vllm/worker/worker_base.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T07:02:54.3155403Z #34 4.259 creating build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3156268Z #34 4.259 copying vllm/attention/backends/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3157300Z #34 4.260 copying vllm/attention/backends/abstract.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3158449Z #34 4.260 copying vllm/attention/backends/differential_flash_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3159636Z #34 4.260 copying vllm/attention/backends/dual_chunk_flash_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3160731Z #34 4.260 copying vllm/attention/backends/flash_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3161791Z #34 4.261 copying vllm/attention/backends/flashmla.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3162974Z #34 4.261 copying vllm/attention/backends/placeholder_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3164042Z #34 4.261 copying vllm/attention/backends/rocm_aiter_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3165094Z #34 4.261 copying vllm/attention/backends/rocm_flash_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3166125Z #34 4.262 copying vllm/attention/backends/triton_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3167119Z #34 4.262 copying vllm/attention/backends/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3168116Z #34 4.262 copying vllm/attention/backends/xformers.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T07:02:54.3168932Z #34 4.262 creating build/lib.linux-x86_64-cpython-312/vllm/attention/layers 2025-09-07T07:02:54.3169823Z #34 4.262 copying vllm/attention/layers/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/layers 2025-09-07T07:02:54.3170843Z #34 4.263 copying vllm/attention/layers/chunked_local_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/layers 2025-09-07T07:02:54.3172267Z #34 4.263 copying vllm/attention/layers/encoder_only_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/layers 2025-09-07T07:02:54.3173158Z #34 4.263 creating build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3173904Z #34 4.263 copying vllm/attention/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3174917Z #34 4.263 copying vllm/attention/ops/chunked_prefill_paged_decode.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3175930Z #34 4.264 copying vllm/attention/ops/common.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3176848Z #34 4.264 copying vllm/attention/ops/flashmla.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3177821Z #34 4.264 copying vllm/attention/ops/merge_attn_states.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3178779Z #34 4.264 copying vllm/attention/ops/paged_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3179840Z #34 4.265 copying vllm/attention/ops/pallas_kv_cache_update.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3180851Z #34 4.265 copying vllm/attention/ops/prefix_prefill.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3181824Z #34 4.265 copying vllm/attention/ops/rocm_aiter_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3182877Z #34 4.265 copying vllm/attention/ops/rocm_aiter_paged_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3184063Z #34 4.265 copying vllm/attention/ops/triton_decode_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3185108Z #34 4.266 copying vllm/attention/ops/triton_flash_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3186130Z #34 4.266 copying vllm/attention/ops/triton_merge_attn_states.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3187181Z #34 4.266 copying vllm/attention/ops/triton_unified_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T07:02:54.3188022Z #34 4.266 creating build/lib.linux-x86_64-cpython-312/vllm/attention/utils 2025-09-07T07:02:54.3188773Z #34 4.266 copying vllm/attention/utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/utils 2025-09-07T07:02:54.3189685Z #34 4.267 copying vllm/attention/utils/fa_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/utils 2025-09-07T07:02:54.3190631Z #34 4.267 copying vllm/attention/utils/kv_sharing_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/utils 2025-09-07T07:02:54.3191473Z #34 4.267 creating build/lib.linux-x86_64-cpython-312/vllm/attention/backends/mla 2025-09-07T07:02:54.3192343Z #34 4.267 copying vllm/attention/backends/mla/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends/mla 2025-09-07T07:02:54.3193397Z #34 4.267 copying vllm/attention/backends/mla/common.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends/mla 2025-09-07T07:02:54.3194241Z #34 4.268 creating build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib 2025-09-07T07:02:54.3194975Z #34 4.268 copying vllm/benchmarks/lib/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib 2025-09-07T07:02:54.3195943Z #34 4.268 copying vllm/benchmarks/lib/endpoint_request_func.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib 2025-09-07T07:02:54.3196960Z #34 4.268 copying vllm/benchmarks/lib/ready_checker.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib 2025-09-07T07:02:54.3197869Z #34 4.269 copying vllm/benchmarks/lib/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib 2025-09-07T07:02:54.3198630Z #34 4.269 creating build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T07:02:54.3199303Z #34 4.269 copying vllm/core/block/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T07:02:54.3200126Z #34 4.269 copying vllm/core/block/block_table.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T07:02:54.3200959Z #34 4.269 copying vllm/core/block/common.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T07:02:54.3201833Z #34 4.270 copying vllm/core/block/cpu_gpu_block_allocator.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T07:02:54.3202747Z #34 4.270 copying vllm/core/block/interfaces.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T07:02:54.3203595Z #34 4.270 copying vllm/core/block/naive_block.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T07:02:54.3204493Z #34 4.270 copying vllm/core/block/prefix_caching_block.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T07:02:54.3205364Z #34 4.271 copying vllm/core/block/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T07:02:54.3206224Z #34 4.271 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3207271Z #34 4.271 copying vllm/distributed/device_communicators/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3208525Z #34 4.271 copying vllm/distributed/device_communicators/all2all.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3209835Z #34 4.271 copying vllm/distributed/device_communicators/all_reduce_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3211887Z #34 4.272 copying vllm/distributed/device_communicators/base_device_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3213324Z #34 4.272 copying vllm/distributed/device_communicators/cpu_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3214737Z #34 4.272 copying vllm/distributed/device_communicators/cuda_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3216131Z #34 4.272 copying vllm/distributed/device_communicators/cuda_wrapper.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3217497Z #34 4.273 copying vllm/distributed/device_communicators/custom_all_reduce.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3218850Z #34 4.273 copying vllm/distributed/device_communicators/pynccl.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3220208Z #34 4.273 copying vllm/distributed/device_communicators/pynccl_wrapper.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3221583Z #34 4.273 copying vllm/distributed/device_communicators/quick_all_reduce.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3222980Z #34 4.274 copying vllm/distributed/device_communicators/ray_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3224430Z #34 4.274 copying vllm/distributed/device_communicators/shm_broadcast.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3225726Z #34 4.274 copying vllm/distributed/device_communicators/symm_mem.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3227045Z #34 4.274 copying vllm/distributed/device_communicators/tpu_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3228440Z #34 4.274 copying vllm/distributed/device_communicators/xpu_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T07:02:54.3229441Z #34 4.275 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb 2025-09-07T07:02:54.3230228Z #34 4.275 copying vllm/distributed/eplb/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb 2025-09-07T07:02:54.3231173Z #34 4.275 copying vllm/distributed/eplb/eplb_state.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb 2025-09-07T07:02:54.3232179Z #34 4.275 copying vllm/distributed/eplb/rebalance_algo.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb 2025-09-07T07:02:54.3233212Z #34 4.276 copying vllm/distributed/eplb/rebalance_execute.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb 2025-09-07T07:02:54.3234084Z #34 4.276 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer 2025-09-07T07:02:54.3234958Z #34 4.276 copying vllm/distributed/kv_transfer/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer 2025-09-07T07:02:54.3236049Z #34 4.276 copying vllm/distributed/kv_transfer/kv_transfer_state.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer 2025-09-07T07:02:54.3237023Z #34 4.276 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector 2025-09-07T07:02:54.3238088Z #34 4.277 copying vllm/distributed/kv_transfer/kv_connector/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector 2025-09-07T07:02:54.3239379Z #34 4.277 copying vllm/distributed/kv_transfer/kv_connector/base.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector 2025-09-07T07:02:54.3240680Z #34 4.277 copying vllm/distributed/kv_transfer/kv_connector/factory.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector 2025-09-07T07:02:54.3242032Z #34 4.277 copying vllm/distributed/kv_transfer/kv_connector/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector 2025-09-07T07:02:54.3243110Z #34 4.277 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T07:02:54.3244219Z #34 4.278 copying vllm/distributed/kv_transfer/kv_lookup_buffer/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T07:02:54.3245576Z #34 4.278 copying vllm/distributed/kv_transfer/kv_lookup_buffer/base.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T07:02:54.3246991Z #34 4.278 copying vllm/distributed/kv_transfer/kv_lookup_buffer/mooncake_store.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T07:02:54.3248438Z #34 4.278 copying vllm/distributed/kv_transfer/kv_lookup_buffer/simple_buffer.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T07:02:54.3249923Z #34 4.279 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T07:02:54.3251023Z #34 4.279 copying vllm/distributed/kv_transfer/kv_pipe/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T07:02:54.3252237Z #34 4.279 copying vllm/distributed/kv_transfer/kv_pipe/base.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T07:02:54.3253514Z #34 4.279 copying vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T07:02:54.3254821Z #34 4.279 copying vllm/distributed/kv_transfer/kv_pipe/pynccl_pipe.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T07:02:54.3255890Z #34 4.280 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T07:02:54.3257037Z #34 4.280 copying vllm/distributed/kv_transfer/kv_connector/v1/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T07:02:54.3258497Z #34 4.280 copying vllm/distributed/kv_transfer/kv_connector/v1/base.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T07:02:54.3259969Z #34 4.280 copying vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T07:02:54.3261502Z #34 4.280 copying vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T07:02:54.3263097Z #34 4.281 copying vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T07:02:54.3264601Z #34 4.281 copying vllm/distributed/kv_transfer/kv_connector/v1/shared_storage_connector.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T07:02:54.3265830Z #34 4.281 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T07:02:54.3266984Z #34 4.281 copying vllm/distributed/kv_transfer/kv_connector/v1/p2p/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T07:02:54.3268487Z #34 4.282 copying vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T07:02:54.3270075Z #34 4.282 copying vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T07:02:54.3271610Z #34 4.282 copying vllm/distributed/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T07:02:54.3272841Z #34 4.282 creating build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing 2025-09-07T07:02:54.3273720Z #34 4.282 copying vllm/engine/multiprocessing/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing 2025-09-07T07:02:54.3274809Z #34 4.283 copying vllm/engine/multiprocessing/client.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing 2025-09-07T07:02:54.3275899Z #34 4.283 copying vllm/engine/multiprocessing/engine.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing 2025-09-07T07:02:54.3276778Z #34 4.283 creating build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T07:02:54.3277657Z #34 4.283 copying vllm/engine/output_processor/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T07:02:54.3278731Z #34 4.283 copying vllm/engine/output_processor/interfaces.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T07:02:54.3279851Z #34 4.284 copying vllm/engine/output_processor/single_step.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T07:02:54.3280967Z #34 4.284 copying vllm/engine/output_processor/stop_checker.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T07:02:54.3282033Z #34 4.284 copying vllm/engine/output_processor/util.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T07:02:54.3282870Z #34 4.285 creating build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T07:02:54.3283624Z #34 4.285 copying vllm/entrypoints/cli/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T07:02:54.3284568Z #34 4.285 copying vllm/entrypoints/cli/collect_env.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T07:02:54.3285501Z #34 4.285 copying vllm/entrypoints/cli/main.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T07:02:54.3286402Z #34 4.285 copying vllm/entrypoints/cli/openai.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T07:02:54.3287338Z #34 4.285 copying vllm/entrypoints/cli/run_batch.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T07:02:54.3288284Z #34 4.286 copying vllm/entrypoints/cli/serve.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T07:02:54.3289191Z #34 4.286 copying vllm/entrypoints/cli/types.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T07:02:54.3289968Z #34 4.286 creating build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3290773Z #34 4.286 copying vllm/entrypoints/openai/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3292059Z #34 4.287 copying vllm/entrypoints/openai/api_server.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3293086Z #34 4.287 copying vllm/entrypoints/openai/cli_args.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3294186Z #34 4.287 copying vllm/entrypoints/openai/logits_processors.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3295291Z #34 4.287 copying vllm/entrypoints/openai/protocol.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3296324Z #34 4.288 copying vllm/entrypoints/openai/run_batch.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3297387Z #34 4.288 copying vllm/entrypoints/openai/serving_chat.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3298572Z #34 4.288 copying vllm/entrypoints/openai/serving_classification.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3299746Z #34 4.288 copying vllm/entrypoints/openai/serving_completion.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3300905Z #34 4.289 copying vllm/entrypoints/openai/serving_embedding.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3302073Z #34 4.289 copying vllm/entrypoints/openai/serving_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3303286Z #34 4.289 copying vllm/entrypoints/openai/serving_models.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3304365Z #34 4.289 copying vllm/entrypoints/openai/serving_pooling.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3305450Z #34 4.289 copying vllm/entrypoints/openai/serving_responses.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3306540Z #34 4.290 copying vllm/entrypoints/openai/serving_score.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3307632Z #34 4.290 copying vllm/entrypoints/openai/serving_tokenization.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3308792Z #34 4.290 copying vllm/entrypoints/openai/serving_transcription.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3309908Z #34 4.290 copying vllm/entrypoints/openai/speech_to_text.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T07:02:54.3310773Z #34 4.291 creating build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T07:02:54.3311690Z #34 4.291 copying vllm/entrypoints/cli/benchmark/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T07:02:54.3312796Z #34 4.291 copying vllm/entrypoints/cli/benchmark/base.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T07:02:54.3313929Z #34 4.291 copying vllm/entrypoints/cli/benchmark/latency.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T07:02:54.3315060Z #34 4.291 copying vllm/entrypoints/cli/benchmark/main.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T07:02:54.3316172Z #34 4.292 copying vllm/entrypoints/cli/benchmark/serve.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T07:02:54.3317368Z #34 4.292 copying vllm/entrypoints/cli/benchmark/throughput.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T07:02:54.3318354Z #34 4.292 creating build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3319346Z #34 4.292 copying vllm/entrypoints/openai/tool_parsers/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3320650Z #34 4.293 copying vllm/entrypoints/openai/tool_parsers/abstract_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3322030Z #34 4.293 copying vllm/entrypoints/openai/tool_parsers/deepseekv31_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3323436Z #34 4.293 copying vllm/entrypoints/openai/tool_parsers/deepseekv3_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3324818Z #34 4.293 copying vllm/entrypoints/openai/tool_parsers/glm4_moe_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3326186Z #34 4.294 copying vllm/entrypoints/openai/tool_parsers/granite_20b_fc_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3327599Z #34 4.294 copying vllm/entrypoints/openai/tool_parsers/granite_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3328953Z #34 4.294 copying vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3330310Z #34 4.294 copying vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3332034Z #34 4.295 copying vllm/entrypoints/openai/tool_parsers/internlm2_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3333424Z #34 4.295 copying vllm/entrypoints/openai/tool_parsers/jamba_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3334805Z #34 4.295 copying vllm/entrypoints/openai/tool_parsers/kimi_k2_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3336242Z #34 4.295 copying vllm/entrypoints/openai/tool_parsers/llama4_pythonic_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3337654Z #34 4.296 copying vllm/entrypoints/openai/tool_parsers/llama_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3339047Z #34 4.296 copying vllm/entrypoints/openai/tool_parsers/minimax_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3340461Z #34 4.296 copying vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3341844Z #34 4.296 copying vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3343359Z #34 4.297 copying vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3344744Z #34 4.297 copying vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3346117Z #34 4.297 copying vllm/entrypoints/openai/tool_parsers/qwen3coder_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3347502Z #34 4.297 copying vllm/entrypoints/openai/tool_parsers/seed_oss_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3349027Z #34 4.297 copying vllm/entrypoints/openai/tool_parsers/step3_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3350508Z #34 4.298 copying vllm/entrypoints/openai/tool_parsers/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3351825Z #34 4.298 copying vllm/entrypoints/openai/tool_parsers/xlam_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T07:02:54.3352784Z #34 4.298 creating build/lib.linux-x86_64-cpython-312/vllm/lora/ops 2025-09-07T07:02:54.3353460Z #34 4.298 copying vllm/lora/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops 2025-09-07T07:02:54.3354177Z #34 4.299 creating build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T07:02:54.3355021Z #34 4.299 copying vllm/lora/punica_wrapper/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T07:02:54.3356062Z #34 4.299 copying vllm/lora/punica_wrapper/punica_base.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T07:02:54.3357103Z #34 4.299 copying vllm/lora/punica_wrapper/punica_cpu.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T07:02:54.3358152Z #34 4.299 copying vllm/lora/punica_wrapper/punica_gpu.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T07:02:54.3359288Z #34 4.299 copying vllm/lora/punica_wrapper/punica_selector.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T07:02:54.3360354Z #34 4.300 copying vllm/lora/punica_wrapper/punica_tpu.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T07:02:54.3361540Z #34 4.300 copying vllm/lora/punica_wrapper/punica_xpu.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T07:02:54.3362607Z #34 4.300 copying vllm/lora/punica_wrapper/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T07:02:54.3363420Z #34 4.300 creating build/lib.linux-x86_64-cpython-312/vllm/lora/ops/ipex_ops 2025-09-07T07:02:54.3364209Z #34 4.301 copying vllm/lora/ops/ipex_ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/ipex_ops 2025-09-07T07:02:54.3365128Z #34 4.301 copying vllm/lora/ops/ipex_ops/lora_ops.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/ipex_ops 2025-09-07T07:02:54.3365926Z #34 4.301 creating build/lib.linux-x86_64-cpython-312/vllm/lora/ops/torch_ops 2025-09-07T07:02:54.3366715Z #34 4.301 copying vllm/lora/ops/torch_ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/torch_ops 2025-09-07T07:02:54.3367668Z #34 4.301 copying vllm/lora/ops/torch_ops/lora_ops.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/torch_ops 2025-09-07T07:02:54.3368477Z #34 4.302 creating build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T07:02:54.3369281Z #34 4.302 copying vllm/lora/ops/triton_ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T07:02:54.3370282Z #34 4.302 copying vllm/lora/ops/triton_ops/kernel_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T07:02:54.3371580Z #34 4.302 copying vllm/lora/ops/triton_ops/lora_expand_op.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T07:02:54.3372675Z #34 4.302 copying vllm/lora/ops/triton_ops/lora_kernel_metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T07:02:54.3373794Z #34 4.303 copying vllm/lora/ops/triton_ops/lora_shrink_op.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T07:02:54.3374831Z #34 4.303 copying vllm/lora/ops/triton_ops/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T07:02:54.3375643Z #34 4.303 creating build/lib.linux-x86_64-cpython-312/vllm/lora/ops/xla_ops 2025-09-07T07:02:54.3376438Z #34 4.303 copying vllm/lora/ops/xla_ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/xla_ops 2025-09-07T07:02:54.3377413Z #34 4.304 copying vllm/lora/ops/xla_ops/lora_ops.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/xla_ops 2025-09-07T07:02:54.3378232Z #34 4.304 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3379095Z #34 4.304 copying vllm/model_executor/layers/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3380155Z #34 4.304 copying vllm/model_executor/layers/activation.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3381304Z #34 4.304 copying vllm/model_executor/layers/attention_layer_base.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3382445Z #34 4.305 copying vllm/model_executor/layers/layernorm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3383661Z #34 4.305 copying vllm/model_executor/layers/lightning_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3384722Z #34 4.305 copying vllm/model_executor/layers/linear.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3385776Z #34 4.305 copying vllm/model_executor/layers/logits_processor.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3386825Z #34 4.306 copying vllm/model_executor/layers/mla.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3387879Z #34 4.306 copying vllm/model_executor/layers/pooler.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3388901Z #34 4.306 copying vllm/model_executor/layers/resampler.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3389935Z #34 4.306 copying vllm/model_executor/layers/sampler.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3390994Z #34 4.306 copying vllm/model_executor/layers/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3392096Z #34 4.307 copying vllm/model_executor/layers/vocab_parallel_embedding.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T07:02:54.3393047Z #34 4.307 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3393947Z #34 4.307 copying vllm/model_executor/model_loader/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3395083Z #34 4.307 copying vllm/model_executor/model_loader/base_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3396280Z #34 4.308 copying vllm/model_executor/model_loader/bitsandbytes_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3397502Z #34 4.308 copying vllm/model_executor/model_loader/default_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3398684Z #34 4.308 copying vllm/model_executor/model_loader/dummy_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3399828Z #34 4.308 copying vllm/model_executor/model_loader/gguf_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3401006Z #34 4.308 copying vllm/model_executor/model_loader/runai_streamer_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3402252Z #34 4.309 copying vllm/model_executor/model_loader/sharded_state_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3403430Z #34 4.309 copying vllm/model_executor/model_loader/tensorizer.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3404611Z #34 4.309 copying vllm/model_executor/model_loader/tensorizer_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3405788Z #34 4.309 copying vllm/model_executor/model_loader/tpu.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3406868Z #34 4.310 copying vllm/model_executor/model_loader/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3407993Z #34 4.310 copying vllm/model_executor/model_loader/weight_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T07:02:54.3408895Z #34 4.311 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3409725Z #34 4.311 copying vllm/model_executor/models/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3410746Z #34 4.311 copying vllm/model_executor/models/adapters.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3412022Z #34 4.312 copying vllm/model_executor/models/aimv2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3413068Z #34 4.312 copying vllm/model_executor/models/apertus.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3414093Z #34 4.312 copying vllm/model_executor/models/arcee.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3415120Z #34 4.312 copying vllm/model_executor/models/arctic.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3416178Z #34 4.313 copying vllm/model_executor/models/aria.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3417209Z #34 4.313 copying vllm/model_executor/models/aya_vision.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3418276Z #34 4.313 copying vllm/model_executor/models/baichuan.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3419395Z #34 4.313 copying vllm/model_executor/models/bailing_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3420449Z #34 4.314 copying vllm/model_executor/models/bamba.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3421462Z #34 4.314 copying vllm/model_executor/models/bart.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3422463Z #34 4.314 copying vllm/model_executor/models/bert.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3423624Z #34 4.314 copying vllm/model_executor/models/bert_with_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3424642Z #34 4.314 copying vllm/model_executor/models/blip.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3425636Z #34 4.315 copying vllm/model_executor/models/blip2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3426630Z #34 4.315 copying vllm/model_executor/models/bloom.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3427638Z #34 4.315 copying vllm/model_executor/models/chameleon.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3428664Z #34 4.315 copying vllm/model_executor/models/chatglm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3429662Z #34 4.316 copying vllm/model_executor/models/clip.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3430687Z #34 4.316 copying vllm/model_executor/models/cohere2_vision.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3431748Z #34 4.316 copying vllm/model_executor/models/commandr.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3432762Z #34 4.316 copying vllm/model_executor/models/config.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3433835Z #34 4.317 copying vllm/model_executor/models/constant_size_cache.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3434922Z #34 4.317 copying vllm/model_executor/models/dbrx.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3435919Z #34 4.317 copying vllm/model_executor/models/deepseek.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3436976Z #34 4.317 copying vllm/model_executor/models/deepseek_eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3438045Z #34 4.317 copying vllm/model_executor/models/deepseek_mtp.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3439100Z #34 4.318 copying vllm/model_executor/models/deepseek_v2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3440160Z #34 4.318 copying vllm/model_executor/models/deepseek_vl2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3441181Z #34 4.318 copying vllm/model_executor/models/donut.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3442171Z #34 4.318 copying vllm/model_executor/models/dots1.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3443167Z #34 4.319 copying vllm/model_executor/models/ernie45.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3444234Z #34 4.319 copying vllm/model_executor/models/ernie45_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3445275Z #34 4.319 copying vllm/model_executor/models/ernie45_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3446319Z #34 4.319 copying vllm/model_executor/models/ernie45_vl_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3447419Z #34 4.320 copying vllm/model_executor/models/ernie_mtp.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3448436Z #34 4.320 copying vllm/model_executor/models/exaone.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3449779Z #34 4.320 copying vllm/model_executor/models/exaone4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3450944Z #34 4.320 copying vllm/model_executor/models/fairseq2_llama.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3452022Z #34 4.320 copying vllm/model_executor/models/falcon.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3453069Z #34 4.321 copying vllm/model_executor/models/falcon_h1.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3454137Z #34 4.321 copying vllm/model_executor/models/florence2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3455177Z #34 4.321 copying vllm/model_executor/models/fuyu.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3456202Z #34 4.321 copying vllm/model_executor/models/gemma.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3457223Z #34 4.322 copying vllm/model_executor/models/gemma2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3458263Z #34 4.322 copying vllm/model_executor/models/gemma3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3459306Z #34 4.322 copying vllm/model_executor/models/gemma3_mm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3460350Z #34 4.322 copying vllm/model_executor/models/gemma3n.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3461408Z #34 4.322 copying vllm/model_executor/models/gemma3n_mm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3462435Z #34 4.323 copying vllm/model_executor/models/glm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3463599Z #34 4.323 copying vllm/model_executor/models/glm4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3464590Z #34 4.323 copying vllm/model_executor/models/glm4_1v.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3465583Z #34 4.323 copying vllm/model_executor/models/glm4_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3466607Z #34 4.324 copying vllm/model_executor/models/glm4_moe_mtp.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3467623Z #34 4.324 copying vllm/model_executor/models/glm4v.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3468601Z #34 4.324 copying vllm/model_executor/models/gpt2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3469614Z #34 4.324 copying vllm/model_executor/models/gpt_bigcode.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3470618Z #34 4.325 copying vllm/model_executor/models/gpt_j.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3471611Z #34 4.325 copying vllm/model_executor/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3472654Z #34 4.325 copying vllm/model_executor/models/gpt_oss.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3473650Z #34 4.325 copying vllm/model_executor/models/granite.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3474698Z #34 4.325 copying vllm/model_executor/models/granite_speech.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3475838Z #34 4.326 copying vllm/model_executor/models/granitemoe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3476936Z #34 4.326 copying vllm/model_executor/models/granitemoehybrid.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3478071Z #34 4.326 copying vllm/model_executor/models/granitemoeshared.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3479132Z #34 4.326 copying vllm/model_executor/models/gritlm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3480138Z #34 4.327 copying vllm/model_executor/models/grok1.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3481119Z #34 4.327 copying vllm/model_executor/models/h2ovl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3482132Z #34 4.327 copying vllm/model_executor/models/hunyuan_v1.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3483231Z #34 4.327 copying vllm/model_executor/models/hyperclovax_vision.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3484371Z #34 4.328 copying vllm/model_executor/models/idefics2_vision_model.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3485463Z #34 4.328 copying vllm/model_executor/models/idefics3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3486503Z #34 4.328 copying vllm/model_executor/models/interfaces.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3487566Z #34 4.328 copying vllm/model_executor/models/interfaces_base.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3488629Z #34 4.328 copying vllm/model_executor/models/intern_vit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3489663Z #34 4.329 copying vllm/model_executor/models/internlm2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3490719Z #34 4.329 copying vllm/model_executor/models/internlm2_ve.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3492089Z #34 4.329 copying vllm/model_executor/models/interns1.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3493164Z #34 4.329 copying vllm/model_executor/models/interns1_vit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3494240Z #34 4.330 copying vllm/model_executor/models/internvl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3495262Z #34 4.330 copying vllm/model_executor/models/jais.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3496281Z #34 4.330 copying vllm/model_executor/models/jamba.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3497308Z #34 4.330 copying vllm/model_executor/models/jina_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3498319Z #34 4.330 copying vllm/model_executor/models/keye.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3499354Z #34 4.331 copying vllm/model_executor/models/keye_vl1_5.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3500395Z #34 4.331 copying vllm/model_executor/models/kimi_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3501429Z #34 4.331 copying vllm/model_executor/models/lfm2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3502450Z #34 4.331 copying vllm/model_executor/models/llama.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3503574Z #34 4.332 copying vllm/model_executor/models/llama4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3504668Z #34 4.332 copying vllm/model_executor/models/llama4_eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3505726Z #34 4.332 copying vllm/model_executor/models/llama_eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3506768Z #34 4.332 copying vllm/model_executor/models/llama_eagle3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3507795Z #34 4.333 copying vllm/model_executor/models/llava.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3508796Z #34 4.333 copying vllm/model_executor/models/llava_next.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3509864Z #34 4.333 copying vllm/model_executor/models/llava_next_video.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3510956Z #34 4.333 copying vllm/model_executor/models/llava_onevision.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3511989Z #34 4.334 copying vllm/model_executor/models/mamba.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3512994Z #34 4.334 copying vllm/model_executor/models/mamba2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3514014Z #34 4.334 copying vllm/model_executor/models/mamba_cache.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3515046Z #34 4.334 copying vllm/model_executor/models/medusa.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3516082Z #34 4.334 copying vllm/model_executor/models/midashenglm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3517091Z #34 4.335 copying vllm/model_executor/models/mimo.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3518086Z #34 4.335 copying vllm/model_executor/models/mimo_mtp.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3519099Z #34 4.335 copying vllm/model_executor/models/minicpm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3520143Z #34 4.335 copying vllm/model_executor/models/minicpm3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3521193Z #34 4.335 copying vllm/model_executor/models/minicpm_eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3522238Z #34 4.336 copying vllm/model_executor/models/minicpmo.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3523264Z #34 4.336 copying vllm/model_executor/models/minicpmv.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3524314Z #34 4.336 copying vllm/model_executor/models/minimax_cache.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3525385Z #34 4.336 copying vllm/model_executor/models/minimax_text_01.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3526450Z #34 4.337 copying vllm/model_executor/models/minimax_vl_01.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3527480Z #34 4.337 copying vllm/model_executor/models/mistral3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3528504Z #34 4.337 copying vllm/model_executor/models/mixtral.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3529575Z #34 4.337 copying vllm/model_executor/models/mixtral_quant.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3530611Z #34 4.338 copying vllm/model_executor/models/mllama.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3531889Z #34 4.338 copying vllm/model_executor/models/mllama4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3533055Z #34 4.338 copying vllm/model_executor/models/mlp_speculator.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3534153Z #34 4.338 copying vllm/model_executor/models/modernbert.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3535264Z #34 4.339 copying vllm/model_executor/models/module_mapping.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3536329Z #34 4.339 copying vllm/model_executor/models/molmo.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3537371Z #34 4.339 copying vllm/model_executor/models/moonvit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3538394Z #34 4.339 copying vllm/model_executor/models/mpt.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3539417Z #34 4.339 copying vllm/model_executor/models/nemotron.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3540483Z #34 4.340 copying vllm/model_executor/models/nemotron_h.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3541554Z #34 4.340 copying vllm/model_executor/models/nemotron_nas.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3542636Z #34 4.340 copying vllm/model_executor/models/nemotron_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3543783Z #34 4.340 copying vllm/model_executor/models/nvlm_d.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3544758Z #34 4.341 copying vllm/model_executor/models/olmo.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3545743Z #34 4.341 copying vllm/model_executor/models/olmo2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3546723Z #34 4.341 copying vllm/model_executor/models/olmoe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3547706Z #34 4.341 copying vllm/model_executor/models/opt.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3548837Z #34 4.341 copying vllm/model_executor/models/orion.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3550020Z #34 4.342 copying vllm/model_executor/models/ovis.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3551047Z #34 4.342 copying vllm/model_executor/models/ovis2_5.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3552097Z #34 4.342 copying vllm/model_executor/models/paligemma.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3553168Z #34 4.342 copying vllm/model_executor/models/persimmon.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3554205Z #34 4.343 copying vllm/model_executor/models/phi.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3555205Z #34 4.343 copying vllm/model_executor/models/phi3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3556218Z #34 4.343 copying vllm/model_executor/models/phi3v.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3557297Z #34 4.343 copying vllm/model_executor/models/phi4_multimodal.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3558454Z #34 4.343 copying vllm/model_executor/models/phi4flash.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3559516Z #34 4.344 copying vllm/model_executor/models/phi4mm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3560571Z #34 4.344 copying vllm/model_executor/models/phi4mm_audio.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3561840Z #34 4.344 copying vllm/model_executor/models/phi4mm_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3562876Z #34 4.345 copying vllm/model_executor/models/phimoe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3563882Z #34 4.345 copying vllm/model_executor/models/pixtral.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3564899Z #34 4.345 copying vllm/model_executor/models/plamo2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3565885Z #34 4.345 copying vllm/model_executor/models/qwen.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3566876Z #34 4.345 copying vllm/model_executor/models/qwen2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3567946Z #34 4.346 copying vllm/model_executor/models/qwen2_5_omni_thinker.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3569013Z #34 4.346 copying vllm/model_executor/models/qwen2_5_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3570048Z #34 4.346 copying vllm/model_executor/models/qwen2_audio.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3571163Z #34 4.346 copying vllm/model_executor/models/qwen2_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3572369Z #34 4.347 copying vllm/model_executor/models/qwen2_rm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3573415Z #34 4.347 copying vllm/model_executor/models/qwen2_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3574430Z #34 4.347 copying vllm/model_executor/models/qwen3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3575476Z #34 4.347 copying vllm/model_executor/models/qwen3_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3576519Z #34 4.348 copying vllm/model_executor/models/qwen_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3577608Z #34 4.348 copying vllm/model_executor/models/registry.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3578674Z #34 4.348 copying vllm/model_executor/models/roberta.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3579699Z #34 4.348 copying vllm/model_executor/models/rvl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3580731Z #34 4.348 copying vllm/model_executor/models/seed_oss.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3581784Z #34 4.349 copying vllm/model_executor/models/siglip.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.3582869Z #34 4.349 copying vllm/model_executor/models/siglip2navit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4128702Z #34 4.349 copying vllm/model_executor/models/skyworkr1v.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4129857Z #34 4.349 copying vllm/model_executor/models/smolvlm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4130989Z #34 4.350 copying vllm/model_executor/models/solar.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4132364Z #34 4.350 copying vllm/model_executor/models/stablelm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4133464Z #34 4.350 copying vllm/model_executor/models/starcoder2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4134538Z #34 4.350 copying vllm/model_executor/models/step3_text.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4135740Z #34 4.350 copying vllm/model_executor/models/step3_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4136785Z #34 4.351 copying vllm/model_executor/models/swin.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4137820Z #34 4.351 copying vllm/model_executor/models/tarsier.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4138895Z #34 4.351 copying vllm/model_executor/models/telechat2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4139958Z #34 4.351 copying vllm/model_executor/models/teleflm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4141039Z #34 4.352 copying vllm/model_executor/models/terratorch.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4142147Z #34 4.352 copying vllm/model_executor/models/transformers.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4143342Z #34 4.352 copying vllm/model_executor/models/ultravox.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4144360Z #34 4.352 copying vllm/model_executor/models/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4145353Z #34 4.352 copying vllm/model_executor/models/vision.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4146366Z #34 4.353 copying vllm/model_executor/models/voxtral.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4147387Z #34 4.353 copying vllm/model_executor/models/whisper.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4148392Z #34 4.353 copying vllm/model_executor/models/zamba2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T07:02:54.4149595Z #34 4.354 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup 2025-09-07T07:02:54.4150461Z #34 4.354 copying vllm/model_executor/warmup/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup 2025-09-07T07:02:54.4151623Z #34 4.354 copying vllm/model_executor/warmup/deep_gemm_warmup.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup 2025-09-07T07:02:54.4152759Z #34 4.354 copying vllm/model_executor/warmup/kernel_warmup.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup 2025-09-07T07:02:54.4153693Z #34 4.355 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4154706Z #34 4.355 copying vllm/model_executor/layers/fused_moe/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4156019Z #34 4.355 copying vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4157429Z #34 4.355 copying vllm/model_executor/layers/fused_moe/batched_triton_or_deep_gemm_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4158788Z #34 4.355 copying vllm/model_executor/layers/fused_moe/config.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4160046Z #34 4.356 copying vllm/model_executor/layers/fused_moe/cpu_fused_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4161425Z #34 4.356 copying vllm/model_executor/layers/fused_moe/cutlass_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4162712Z #34 4.356 copying vllm/model_executor/layers/fused_moe/deep_gemm_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4163963Z #34 4.356 copying vllm/model_executor/layers/fused_moe/deep_gemm_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4165351Z #34 4.357 copying vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4166791Z #34 4.357 copying vllm/model_executor/layers/fused_moe/deepep_ll_prepare_finalize.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4168156Z #34 4.357 copying vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4169593Z #34 4.357 copying vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4171057Z #34 4.357 copying vllm/model_executor/layers/fused_moe/fused_batched_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4172565Z #34 4.358 copying vllm/model_executor/layers/fused_moe/fused_marlin_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4173865Z #34 4.358 copying vllm/model_executor/layers/fused_moe/fused_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4175190Z #34 4.358 copying vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4176517Z #34 4.358 copying vllm/model_executor/layers/fused_moe/layer.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4177798Z #34 4.359 copying vllm/model_executor/layers/fused_moe/modular_kernel.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4179126Z #34 4.359 copying vllm/model_executor/layers/fused_moe/moe_align_block_size.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4180449Z #34 4.359 copying vllm/model_executor/layers/fused_moe/moe_pallas.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4181820Z #34 4.359 copying vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4183293Z #34 4.360 copying vllm/model_executor/layers/fused_moe/moe_torch_iterative.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4184625Z #34 4.360 copying vllm/model_executor/layers/fused_moe/pplx_prepare_finalize.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4185938Z #34 4.360 copying vllm/model_executor/layers/fused_moe/prepare_finalize.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4187247Z #34 4.360 copying vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4188561Z #34 4.360 copying vllm/model_executor/layers/fused_moe/routing_simulator.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4189879Z #34 4.361 copying vllm/model_executor/layers/fused_moe/topk_weight_and_reduce.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4191208Z #34 4.361 copying vllm/model_executor/layers/fused_moe/triton_deep_gemm_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4192520Z #34 4.361 copying vllm/model_executor/layers/fused_moe/trtllm_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4193721Z #34 4.361 copying vllm/model_executor/layers/fused_moe/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T07:02:54.4194675Z #34 4.362 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T07:02:54.4195616Z #34 4.362 copying vllm/model_executor/layers/mamba/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T07:02:54.4196799Z #34 4.362 copying vllm/model_executor/layers/mamba/abstract.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T07:02:54.4197972Z #34 4.362 copying vllm/model_executor/layers/mamba/linear_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T07:02:54.4199163Z #34 4.362 copying vllm/model_executor/layers/mamba/mamba2_metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T07:02:54.4200369Z #34 4.363 copying vllm/model_executor/layers/mamba/mamba_mixer.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T07:02:54.4201552Z #34 4.363 copying vllm/model_executor/layers/mamba/mamba_mixer2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T07:02:54.4202729Z #34 4.363 copying vllm/model_executor/layers/mamba/mamba_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T07:02:54.4203903Z #34 4.363 copying vllm/model_executor/layers/mamba/short_conv.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T07:02:54.4204860Z #34 4.364 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4205887Z #34 4.364 copying vllm/model_executor/layers/quantization/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4207188Z #34 4.364 copying vllm/model_executor/layers/quantization/auto_round.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4208462Z #34 4.364 copying vllm/model_executor/layers/quantization/awq.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4209748Z #34 4.364 copying vllm/model_executor/layers/quantization/awq_marlin.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4211135Z #34 4.365 copying vllm/model_executor/layers/quantization/awq_triton.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4212695Z #34 4.365 copying vllm/model_executor/layers/quantization/base_config.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4214064Z #34 4.365 copying vllm/model_executor/layers/quantization/bitblas.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4215443Z #34 4.365 copying vllm/model_executor/layers/quantization/bitsandbytes.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4216835Z #34 4.366 copying vllm/model_executor/layers/quantization/deepgemm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4218223Z #34 4.366 copying vllm/model_executor/layers/quantization/deepspeedfp.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4219612Z #34 4.366 copying vllm/model_executor/layers/quantization/experts_int8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4220995Z #34 4.366 copying vllm/model_executor/layers/quantization/fbgemm_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4222329Z #34 4.366 copying vllm/model_executor/layers/quantization/fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4223745Z #34 4.367 copying vllm/model_executor/layers/quantization/gguf.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4225010Z #34 4.367 copying vllm/model_executor/layers/quantization/gptq.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4226361Z #34 4.367 copying vllm/model_executor/layers/quantization/gptq_bitblas.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4227699Z #34 4.367 copying vllm/model_executor/layers/quantization/gptq_marlin.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4229115Z #34 4.368 copying vllm/model_executor/layers/quantization/gptq_marlin_24.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4230396Z #34 4.368 copying vllm/model_executor/layers/quantization/hqq_marlin.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4231643Z #34 4.368 copying vllm/model_executor/layers/quantization/inc.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4232915Z #34 4.368 copying vllm/model_executor/layers/quantization/input_quant_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4234213Z #34 4.369 copying vllm/model_executor/layers/quantization/ipex_quant.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4235487Z #34 4.369 copying vllm/model_executor/layers/quantization/kv_cache.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4236761Z #34 4.369 copying vllm/model_executor/layers/quantization/modelopt.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4238031Z #34 4.369 copying vllm/model_executor/layers/quantization/moe_wna16.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4239286Z #34 4.369 copying vllm/model_executor/layers/quantization/mxfp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4240515Z #34 4.370 copying vllm/model_executor/layers/quantization/petit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4241781Z #34 4.370 copying vllm/model_executor/layers/quantization/ptpc_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4243054Z #34 4.370 copying vllm/model_executor/layers/quantization/rtn.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4244280Z #34 4.370 copying vllm/model_executor/layers/quantization/schema.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4245545Z #34 4.371 copying vllm/model_executor/layers/quantization/torchao.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4246814Z #34 4.371 copying vllm/model_executor/layers/quantization/tpu_int8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T07:02:54.4247835Z #34 4.371 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4249049Z #34 4.371 copying vllm/model_executor/layers/rotary_embedding/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4250595Z #34 4.371 copying vllm/model_executor/layers/rotary_embedding/base.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4252069Z #34 4.372 copying vllm/model_executor/layers/rotary_embedding/common.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4253614Z #34 4.372 copying vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4255115Z #34 4.372 copying vllm/model_executor/layers/rotary_embedding/dual_chunk_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4256684Z #34 4.372 copying vllm/model_executor/layers/rotary_embedding/dynamic_ntk_alpha_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4258290Z #34 4.373 copying vllm/model_executor/layers/rotary_embedding/dynamic_ntk_scaling_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4259798Z #34 4.373 copying vllm/model_executor/layers/rotary_embedding/ernie45_vl_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4261294Z #34 4.373 copying vllm/model_executor/layers/rotary_embedding/linear_scaling_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4262875Z #34 4.373 copying vllm/model_executor/layers/rotary_embedding/llama3_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4264359Z #34 4.374 copying vllm/model_executor/layers/rotary_embedding/llama4_vision_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4265730Z #34 4.374 copying vllm/model_executor/layers/rotary_embedding/mrope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4267071Z #34 4.374 copying vllm/model_executor/layers/rotary_embedding/ntk_scaling_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4268508Z #34 4.374 copying vllm/model_executor/layers/rotary_embedding/phi3_long_rope_scaled_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4270145Z #34 4.375 copying vllm/model_executor/layers/rotary_embedding/yarn_scaling_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T07:02:54.4271260Z #34 4.375 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/shared_fused_moe 2025-09-07T07:02:54.4272333Z #34 4.375 copying vllm/model_executor/layers/shared_fused_moe/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/shared_fused_moe 2025-09-07T07:02:54.4273741Z #34 4.375 copying vllm/model_executor/layers/shared_fused_moe/shared_fused_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/shared_fused_moe 2025-09-07T07:02:54.4274817Z #34 4.376 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T07:02:54.4275808Z #34 4.376 copying vllm/model_executor/layers/mamba/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T07:02:54.4277058Z #34 4.376 copying vllm/model_executor/layers/mamba/ops/causal_conv1d.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T07:02:54.4278437Z #34 4.376 copying vllm/model_executor/layers/mamba/ops/layernorm_gated.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T07:02:54.4279729Z #34 4.377 copying vllm/model_executor/layers/mamba/ops/mamba_ssm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T07:02:54.4281056Z #34 4.377 copying vllm/model_executor/layers/mamba/ops/ssd_bmm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T07:02:54.4282270Z #34 4.377 copying vllm/model_executor/layers/mamba/ops/ssd_chunk_scan.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T07:02:54.4283561Z #34 4.377 copying vllm/model_executor/layers/mamba/ops/ssd_chunk_state.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T07:02:54.4284804Z #34 4.378 copying vllm/model_executor/layers/mamba/ops/ssd_combined.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T07:02:54.4286060Z #34 4.378 copying vllm/model_executor/layers/mamba/ops/ssd_state_passing.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T07:02:54.4287257Z #34 4.378 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T07:02:54.4288542Z #34 4.378 copying vllm/model_executor/layers/quantization/compressed_tensors/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T07:02:54.4290248Z #34 4.378 copying vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T07:02:54.4292328Z #34 4.379 copying vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T07:02:54.4294184Z #34 4.379 copying vllm/model_executor/layers/quantization/compressed_tensors/triton_scaled_mm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T07:02:54.4295972Z #34 4.379 copying vllm/model_executor/layers/quantization/compressed_tensors/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T07:02:54.4297270Z #34 4.379 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels 2025-09-07T07:02:54.4298470Z #34 4.380 copying vllm/model_executor/layers/quantization/kernels/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels 2025-09-07T07:02:54.4299660Z #34 4.380 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark 2025-09-07T07:02:54.4300822Z #34 4.380 copying vllm/model_executor/layers/quantization/quark/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark 2025-09-07T07:02:54.4302287Z #34 4.380 copying vllm/model_executor/layers/quantization/quark/quark.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark 2025-09-07T07:02:54.4303973Z #34 4.380 copying vllm/model_executor/layers/quantization/quark/quark_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark 2025-09-07T07:02:54.4305308Z #34 4.381 copying vllm/model_executor/layers/quantization/quark/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark 2025-09-07T07:02:54.4306345Z #34 4.381 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4307372Z #34 4.381 copying vllm/model_executor/layers/quantization/utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4308715Z #34 4.381 copying vllm/model_executor/layers/quantization/utils/allspark_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4310096Z #34 4.382 copying vllm/model_executor/layers/quantization/utils/bitblas_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4311491Z #34 4.382 copying vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4312910Z #34 4.382 copying vllm/model_executor/layers/quantization/utils/flashinfer_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4314300Z #34 4.382 copying vllm/model_executor/layers/quantization/utils/fp8_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4315622Z #34 4.383 copying vllm/model_executor/layers/quantization/utils/gptq_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4316957Z #34 4.383 copying vllm/model_executor/layers/quantization/utils/int8_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4318354Z #34 4.383 copying vllm/model_executor/layers/quantization/utils/layer_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4319703Z #34 4.383 copying vllm/model_executor/layers/quantization/utils/machete_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4321072Z #34 4.383 copying vllm/model_executor/layers/quantization/utils/marlin_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4322450Z #34 4.384 copying vllm/model_executor/layers/quantization/utils/marlin_utils_fp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4323827Z #34 4.384 copying vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4325227Z #34 4.384 copying vllm/model_executor/layers/quantization/utils/marlin_utils_test.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4326624Z #34 4.384 copying vllm/model_executor/layers/quantization/utils/marlin_utils_test_24.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4328006Z #34 4.385 copying vllm/model_executor/layers/quantization/utils/mxfp4_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4329355Z #34 4.385 copying vllm/model_executor/layers/quantization/utils/mxfp8_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4330734Z #34 4.385 copying vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4332520Z #34 4.385 copying vllm/model_executor/layers/quantization/utils/nvfp4_moe_support.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4334110Z #34 4.385 copying vllm/model_executor/layers/quantization/utils/petit_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4335617Z #34 4.386 copying vllm/model_executor/layers/quantization/utils/quant_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4337126Z #34 4.386 copying vllm/model_executor/layers/quantization/utils/w8a8_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T07:02:54.4338429Z #34 4.386 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4339927Z #34 4.386 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4341929Z #34 4.387 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_24.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4344096Z #34 4.387 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_scheme.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4345995Z #34 4.387 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_24.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4347897Z #34 4.387 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_nvfp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4350270Z #34 4.387 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a4_nvfp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4352402Z #34 4.388 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4354530Z #34 4.388 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_int.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4356668Z #34 4.388 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4358790Z #34 4.388 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4360922Z #34 4.389 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4363144Z #34 4.389 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T07:02:54.4364559Z #34 4.389 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T07:02:54.4365933Z #34 4.389 copying vllm/model_executor/layers/quantization/compressed_tensors/transform/linear.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T07:02:54.4367745Z #34 4.390 copying vllm/model_executor/layers/quantization/compressed_tensors/transform/module.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T07:02:54.4369482Z #34 4.390 copying vllm/model_executor/layers/quantization/compressed_tensors/transform/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T07:02:54.4370945Z #34 4.390 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes 2025-09-07T07:02:54.4372830Z #34 4.390 copying vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes/linear_qutlass_nvfp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes 2025-09-07T07:02:54.4374595Z #34 4.391 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T07:02:54.4376124Z #34 4.391 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/MPLinearKernel.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T07:02:54.4378037Z #34 4.391 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T07:02:54.4379974Z #34 4.391 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/allspark.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T07:02:54.4381844Z #34 4.391 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/bitblas.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T07:02:54.4383973Z #34 4.391 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/conch.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T07:02:54.4385630Z #34 4.392 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/cutlass.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T07:02:54.4387299Z #34 4.392 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/dynamic_4bit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T07:02:54.4388981Z #34 4.392 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/exllama.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T07:02:54.4390641Z #34 4.392 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/machete.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T07:02:54.4392297Z #34 4.393 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/marlin.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T07:02:54.4393566Z #34 4.393 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T07:02:54.4394858Z #34 4.393 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/ScaledMMLinearKernel.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T07:02:54.4396475Z #34 4.393 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T07:02:54.4397998Z #34 4.394 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/aiter.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T07:02:54.4399528Z #34 4.394 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T07:02:54.4401049Z #34 4.394 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/cutlass.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T07:02:54.4402592Z #34 4.394 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T07:02:54.4404094Z #34 4.395 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/xla.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T07:02:54.4405266Z #34 4.395 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T07:02:54.4406430Z #34 4.395 copying vllm/model_executor/layers/quantization/quark/schemes/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T07:02:54.4407911Z #34 4.395 copying vllm/model_executor/layers/quantization/quark/schemes/quark_scheme.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T07:02:54.4409479Z #34 4.396 copying vllm/model_executor/layers/quantization/quark/schemes/quark_w4a4_mxfp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T07:02:54.4411096Z #34 4.396 copying vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T07:02:54.4412952Z #34 4.396 copying vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_int8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T07:02:54.4414229Z #34 4.397 creating build/lib.linux-x86_64-cpython-312/vllm/plugins/io_processors 2025-09-07T07:02:54.4415087Z #34 4.397 copying vllm/plugins/io_processors/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/plugins/io_processors 2025-09-07T07:02:54.4416155Z #34 4.397 copying vllm/plugins/io_processors/interface.py -> build/lib.linux-x86_64-cpython-312/vllm/plugins/io_processors 2025-09-07T07:02:54.4417049Z #34 4.397 creating build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers 2025-09-07T07:02:54.4417924Z #34 4.397 copying vllm/plugins/lora_resolvers/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers 2025-09-07T07:02:54.4419070Z #34 4.398 copying vllm/plugins/lora_resolvers/filesystem_resolver.py -> build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers 2025-09-07T07:02:54.4420068Z #34 4.398 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T07:02:54.4421115Z #34 4.398 copying vllm/transformers_utils/chat_templates/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T07:02:54.4422416Z #34 4.398 copying vllm/transformers_utils/chat_templates/registry.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T07:02:54.4423527Z #34 4.399 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4424504Z #34 4.399 copying vllm/transformers_utils/configs/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4425531Z #34 4.399 copying vllm/transformers_utils/configs/arctic.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4426551Z #34 4.399 copying vllm/transformers_utils/configs/chatglm.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4427611Z #34 4.399 copying vllm/transformers_utils/configs/deepseek_vl2.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4428680Z #34 4.399 copying vllm/transformers_utils/configs/eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4429703Z #34 4.400 copying vllm/transformers_utils/configs/falcon.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4430712Z #34 4.400 copying vllm/transformers_utils/configs/jais.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4431714Z #34 4.400 copying vllm/transformers_utils/configs/kimi_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4432730Z #34 4.400 copying vllm/transformers_utils/configs/medusa.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4433784Z #34 4.401 copying vllm/transformers_utils/configs/midashenglm.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4434842Z #34 4.401 copying vllm/transformers_utils/configs/mistral.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4435911Z #34 4.401 copying vllm/transformers_utils/configs/mlp_speculator.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4436964Z #34 4.401 copying vllm/transformers_utils/configs/moonvit.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4438033Z #34 4.401 copying vllm/transformers_utils/configs/nemotron.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4439093Z #34 4.402 copying vllm/transformers_utils/configs/nemotron_h.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4440149Z #34 4.402 copying vllm/transformers_utils/configs/nemotron_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4441236Z #34 4.402 copying vllm/transformers_utils/configs/ovis.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4442247Z #34 4.402 copying vllm/transformers_utils/configs/step3_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4443289Z #34 4.402 copying vllm/transformers_utils/configs/ultravox.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T07:02:54.4444156Z #34 4.403 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors 2025-09-07T07:02:54.4445030Z #34 4.403 copying vllm/transformers_utils/processors/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors 2025-09-07T07:02:54.4446149Z #34 4.403 copying vllm/transformers_utils/processors/deepseek_vl2.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors 2025-09-07T07:02:54.4447262Z #34 4.403 copying vllm/transformers_utils/processors/ovis.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors 2025-09-07T07:02:54.4448335Z #34 4.403 copying vllm/transformers_utils/processors/ovis2_5.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors 2025-09-07T07:02:54.4449569Z #34 4.404 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizers 2025-09-07T07:02:54.4450557Z #34 4.404 copying vllm/transformers_utils/tokenizers/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizers 2025-09-07T07:02:54.4451873Z #34 4.404 copying vllm/transformers_utils/tokenizers/mistral.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizers 2025-09-07T07:02:54.4452915Z #34 4.404 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators 2025-09-07T07:02:54.4454042Z #34 4.405 copying vllm/transformers_utils/configs/speculators/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators 2025-09-07T07:02:54.4455482Z #34 4.405 copying vllm/transformers_utils/configs/speculators/algos.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators 2025-09-07T07:02:54.4456961Z #34 4.405 copying vllm/transformers_utils/configs/speculators/base.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators 2025-09-07T07:02:54.4457982Z #34 4.405 creating build/lib.linux-x86_64-cpython-312/vllm/v1/attention 2025-09-07T07:02:54.4458717Z #34 4.405 copying vllm/v1/attention/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention 2025-09-07T07:02:54.4459427Z #34 4.406 creating build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T07:02:54.4460090Z #34 4.406 copying vllm/v1/core/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T07:02:54.4460857Z #34 4.406 copying vllm/v1/core/block_pool.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T07:02:54.4461720Z #34 4.406 copying vllm/v1/core/encoder_cache_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T07:02:54.4462736Z #34 4.406 copying vllm/v1/core/kv_cache_coordinator.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T07:02:54.4463638Z #34 4.407 copying vllm/v1/core/kv_cache_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T07:02:54.4464387Z #34 4.407 copying vllm/v1/core/kv_cache_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T07:02:54.4465178Z #34 4.407 copying vllm/v1/core/single_type_kv_cache_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T07:02:54.4465912Z #34 4.408 creating build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4466525Z #34 4.408 copying vllm/v1/engine/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4467232Z #34 4.408 copying vllm/v1/engine/async_llm.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4467987Z #34 4.408 copying vllm/v1/engine/coordinator.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4468783Z #34 4.408 copying vllm/v1/engine/core.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4469508Z #34 4.409 copying vllm/v1/engine/core_client.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4470272Z #34 4.409 copying vllm/v1/engine/detokenizer.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4471031Z #34 4.409 copying vllm/v1/engine/exceptions.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4471791Z #34 4.409 copying vllm/v1/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4472528Z #34 4.409 copying vllm/v1/engine/logprobs.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4473311Z #34 4.410 copying vllm/v1/engine/output_processor.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4474132Z #34 4.410 copying vllm/v1/engine/parallel_sampling.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4474918Z #34 4.410 copying vllm/v1/engine/processor.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4475655Z #34 4.410 copying vllm/v1/engine/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T07:02:54.4476263Z #34 4.411 creating build/lib.linux-x86_64-cpython-312/vllm/v1/executor 2025-09-07T07:02:54.4476896Z #34 4.411 copying vllm/v1/executor/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/executor 2025-09-07T07:02:54.4477660Z #34 4.411 copying vllm/v1/executor/abstract.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/executor 2025-09-07T07:02:54.4478480Z #34 4.411 copying vllm/v1/executor/multiproc_executor.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/executor 2025-09-07T07:02:54.4479380Z #34 4.411 copying vllm/v1/executor/ray_distributed_executor.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/executor 2025-09-07T07:02:54.4480094Z #34 4.412 creating build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T07:02:54.4480715Z #34 4.412 copying vllm/v1/metrics/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T07:02:54.4481478Z #34 4.412 copying vllm/v1/metrics/loggers.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T07:02:54.4482248Z #34 4.412 copying vllm/v1/metrics/prometheus.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T07:02:54.4483035Z #34 4.413 copying vllm/v1/metrics/ray_wrappers.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T07:02:54.4483788Z #34 4.413 copying vllm/v1/metrics/reader.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T07:02:54.4484522Z #34 4.413 copying vllm/v1/metrics/stats.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T07:02:54.4485129Z #34 4.413 creating build/lib.linux-x86_64-cpython-312/vllm/v1/pool 2025-09-07T07:02:54.4485727Z #34 4.413 copying vllm/v1/pool/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/pool 2025-09-07T07:02:54.4486428Z #34 4.414 copying vllm/v1/pool/metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/pool 2025-09-07T07:02:54.4487034Z #34 4.414 creating build/lib.linux-x86_64-cpython-312/vllm/v1/sample 2025-09-07T07:02:54.4487652Z #34 4.414 copying vllm/v1/sample/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample 2025-09-07T07:02:54.4488366Z #34 4.414 copying vllm/v1/sample/metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample 2025-09-07T07:02:54.4489154Z #34 4.414 copying vllm/v1/sample/rejection_sampler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample 2025-09-07T07:02:54.4489965Z #34 4.415 copying vllm/v1/sample/sampler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample 2025-09-07T07:02:54.4490606Z #34 4.415 creating build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T07:02:54.4491539Z #34 4.415 copying vllm/v1/spec_decode/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T07:02:54.4492450Z #34 4.415 copying vllm/v1/spec_decode/eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T07:02:54.4493368Z #34 4.416 copying vllm/v1/spec_decode/medusa.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T07:02:54.4494282Z #34 4.416 copying vllm/v1/spec_decode/metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T07:02:54.4495181Z #34 4.416 copying vllm/v1/spec_decode/metrics.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T07:02:54.4496121Z #34 4.416 copying vllm/v1/spec_decode/ngram_proposer.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T07:02:54.4497042Z #34 4.416 copying vllm/v1/spec_decode/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T07:02:54.4497830Z #34 4.417 creating build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T07:02:54.4498695Z #34 4.417 copying vllm/v1/structured_output/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T07:02:54.4499780Z #34 4.417 copying vllm/v1/structured_output/backend_guidance.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T07:02:54.4500969Z #34 4.417 copying vllm/v1/structured_output/backend_lm_format_enforcer.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T07:02:54.4502142Z #34 4.418 copying vllm/v1/structured_output/backend_outlines.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T07:02:54.4503255Z #34 4.418 copying vllm/v1/structured_output/backend_types.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T07:02:54.4504402Z #34 4.418 copying vllm/v1/structured_output/backend_xgrammar.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T07:02:54.4505359Z #34 4.418 copying vllm/v1/structured_output/request.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T07:02:54.4506278Z #34 4.418 copying vllm/v1/structured_output/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T07:02:54.4506980Z #34 4.419 creating build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4507616Z #34 4.419 copying vllm/v1/worker/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4508350Z #34 4.419 copying vllm/v1/worker/block_table.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4509114Z #34 4.419 copying vllm/v1/worker/cpu_model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4509886Z #34 4.419 copying vllm/v1/worker/cpu_worker.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4510640Z #34 4.420 copying vllm/v1/worker/gpu_input_batch.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4511427Z #34 4.420 copying vllm/v1/worker/gpu_model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4512198Z #34 4.420 copying vllm/v1/worker/gpu_worker.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4513031Z #34 4.421 copying vllm/v1/worker/kv_connector_model_runner_mixin.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4513933Z #34 4.421 copying vllm/v1/worker/lora_model_runner_mixin.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4514742Z #34 4.421 copying vllm/v1/worker/tpu_input_batch.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4515529Z #34 4.421 copying vllm/v1/worker/tpu_model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4516327Z #34 4.422 copying vllm/v1/worker/tpu_worker.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4517048Z #34 4.422 copying vllm/v1/worker/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4517781Z #34 4.422 copying vllm/v1/worker/worker_base.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4519082Z #34 4.422 copying vllm/v1/worker/xpu_model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4519850Z #34 4.422 copying vllm/v1/worker/xpu_worker.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T07:02:54.4520532Z #34 4.423 creating build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4521302Z #34 4.423 copying vllm/v1/attention/backends/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4522255Z #34 4.423 copying vllm/v1/attention/backends/cpu_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4523219Z #34 4.423 copying vllm/v1/attention/backends/flash_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4524222Z #34 4.424 copying vllm/v1/attention/backends/flashinfer.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4525249Z #34 4.424 copying vllm/v1/attention/backends/flex_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4526255Z #34 4.424 copying vllm/v1/attention/backends/linear_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4527254Z #34 4.424 copying vllm/v1/attention/backends/mamba1_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4528238Z #34 4.425 copying vllm/v1/attention/backends/mamba2_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4529234Z #34 4.425 copying vllm/v1/attention/backends/mamba_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4530208Z #34 4.425 copying vllm/v1/attention/backends/pallas.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4531419Z #34 4.425 copying vllm/v1/attention/backends/rocm_aiter_fa.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4532575Z #34 4.425 copying vllm/v1/attention/backends/short_conv_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4533746Z #34 4.426 copying vllm/v1/attention/backends/tree_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4534848Z #34 4.426 copying vllm/v1/attention/backends/triton_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4535947Z #34 4.426 copying vllm/v1/attention/backends/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4537028Z #34 4.426 copying vllm/v1/attention/backends/xformers.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T07:02:54.4537939Z #34 4.427 creating build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T07:02:54.4538873Z #34 4.427 copying vllm/v1/attention/backends/mla/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T07:02:54.4540024Z #34 4.427 copying vllm/v1/attention/backends/mla/common.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T07:02:54.4541226Z #34 4.427 copying vllm/v1/attention/backends/mla/cutlass_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T07:02:54.4542445Z #34 4.427 copying vllm/v1/attention/backends/mla/flashattn_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T07:02:54.4543772Z #34 4.428 copying vllm/v1/attention/backends/mla/flashmla.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T07:02:54.4544881Z #34 4.428 copying vllm/v1/attention/backends/mla/rocm_aiter_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T07:02:54.4545946Z #34 4.428 copying vllm/v1/attention/backends/mla/triton_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T07:02:54.4546779Z #34 4.429 creating build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T07:02:54.4547457Z #34 4.429 copying vllm/v1/core/sched/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T07:02:54.4548281Z #34 4.429 copying vllm/v1/core/sched/async_scheduler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T07:02:54.4549454Z #34 4.429 copying vllm/v1/core/sched/interface.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T07:02:54.4550350Z #34 4.429 copying vllm/v1/core/sched/output.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T07:02:54.4551278Z #34 4.429 copying vllm/v1/core/sched/request_queue.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T07:02:54.4552206Z #34 4.430 copying vllm/v1/core/sched/scheduler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T07:02:54.4553104Z #34 4.430 copying vllm/v1/core/sched/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T07:02:54.4553907Z #34 4.430 creating build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor 2025-09-07T07:02:54.4554835Z #34 4.430 copying vllm/v1/sample/logits_processor/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor 2025-09-07T07:02:54.4555989Z #34 4.431 copying vllm/v1/sample/logits_processor/builtin.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor 2025-09-07T07:02:54.4557152Z #34 4.431 copying vllm/v1/sample/logits_processor/interface.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor 2025-09-07T07:02:54.4558318Z #34 4.431 copying vllm/v1/sample/logits_processor/state.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor 2025-09-07T07:02:54.4559196Z #34 4.431 creating build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T07:02:54.4559933Z #34 4.431 copying vllm/v1/sample/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T07:02:54.4560826Z #34 4.432 copying vllm/v1/sample/ops/bad_words.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T07:02:54.4561824Z #34 4.432 copying vllm/v1/sample/ops/logprobs.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T07:02:54.4562756Z #34 4.432 copying vllm/v1/sample/ops/penalties.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T07:02:54.4563670Z #34 4.432 copying vllm/v1/sample/ops/topk_topp_sampler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T07:02:54.4564419Z #34 4.433 creating build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu 2025-09-07T07:02:54.4565121Z #34 4.433 copying vllm/v1/sample/tpu/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu 2025-09-07T07:02:54.4565950Z #34 4.433 copying vllm/v1/sample/tpu/metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu 2025-09-07T07:02:54.4566805Z #34 4.433 copying vllm/v1/sample/tpu/sampler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu 2025-09-07T07:02:54.4567405Z #34 4.434 running egg_info 2025-09-07T07:02:54.4567687Z #34 4.446 creating vllm.egg-info 2025-09-07T07:02:54.4568022Z #34 4.446 writing vllm.egg-info/PKG-INFO 2025-09-07T07:02:54.4568483Z #34 4.448 writing dependency_links to vllm.egg-info/dependency_links.txt 2025-09-07T07:02:54.4569045Z #34 4.448 writing entry points to vllm.egg-info/entry_points.txt 2025-09-07T07:02:54.4569530Z #34 4.451 writing requirements to vllm.egg-info/requires.txt 2025-09-07T07:02:54.5645158Z #34 4.451 writing top-level names to vllm.egg-info/top_level.txt 2025-09-07T07:02:54.5645774Z #34 4.451 writing manifest file 'vllm.egg-info/SOURCES.txt' 2025-09-07T07:02:54.8293070Z #34 4.867 reading manifest template 'MANIFEST.in' 2025-09-07T07:02:54.9293189Z #34 4.874 adding license file 'LICENSE' 2025-09-07T07:02:54.9293710Z #34 4.904 writing manifest file 'vllm.egg-info/SOURCES.txt' 2025-09-07T07:02:54.9294301Z #34 4.935 copying vllm/py.typed -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T07:02:54.9295273Z #34 4.935 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9296814Z #34 4.935 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9298727Z #34 4.936 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9300612Z #34 4.936 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9302594Z #34 4.936 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9304757Z #34 4.936 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9306873Z #34 4.937 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9308756Z #34 4.937 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9310553Z #34 4.937 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9312299Z #34 4.937 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9314204Z #34 4.938 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9316045Z #34 4.938 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9317975Z #34 4.938 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9319895Z #34 4.938 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9321690Z #34 4.938 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9323359Z #34 4.939 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9325324Z #34 4.939 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9327144Z #34 4.939 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9328813Z #34 4.939 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9330664Z #34 4.940 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=352,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9333122Z #34 4.940 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9335323Z #34 4.940 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9337346Z #34 4.940 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9339153Z #34 4.941 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9340965Z #34 4.941 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9343020Z #34 4.941 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9345050Z #34 4.941 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9346756Z #34 4.942 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9348587Z #34 4.942 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9350889Z #34 4.942 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9352811Z #34 4.942 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9355001Z #34 4.942 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9357075Z #34 4.943 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9359161Z #34 4.943 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9361026Z #34 4.943 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9362944Z #34 4.943 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9364970Z #34 4.944 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9366724Z #34 4.944 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=96,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9368348Z #34 4.944 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9370138Z #34 4.944 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9372053Z #34 4.945 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9373694Z #34 4.945 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_H100.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9375413Z #34 4.945 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9377140Z #34 4.945 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9378866Z #34 4.945 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9380764Z #34 4.946 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9382571Z #34 4.946 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9384464Z #34 4.946 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9386214Z #34 4.946 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9387880Z #34 4.947 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9389551Z #34 4.947 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9391324Z #34 4.947 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9393123Z #34 4.947 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9394917Z #34 4.947 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3200,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9396798Z #34 4.948 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9398546Z #34 4.948 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9411184Z #34 4.948 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=6400,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9413381Z #34 4.948 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9415201Z #34 4.949 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9417003Z #34 4.949 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9418852Z #34 4.949 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=800,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9420623Z #34 4.949 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9422322Z #34 4.949 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9424157Z #34 4.950 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=320,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9425884Z #34 4.950 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9427754Z #34 4.950 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9429627Z #34 4.950 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9431484Z #34 4.951 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9433386Z #34 4.951 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9435247Z #34 4.951 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9437096Z #34 4.951 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9439026Z #34 4.951 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325X,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9440931Z #34 4.952 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9442908Z #34 4.952 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9444793Z #34 4.952 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9446691Z #34 4.952 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9448583Z #34 4.953 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9450921Z #34 4.953 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9452896Z #34 4.953 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9454900Z #34 4.953 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9456858Z #34 4.954 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9458859Z #34 4.954 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9460871Z #34 4.954 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9462941Z #34 4.954 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9464850Z #34 4.954 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9466712Z #34 4.955 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9468567Z #34 4.955 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9470526Z #34 4.955 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9472386Z #34 4.955 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9474297Z #34 4.956 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9476103Z #34 4.956 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=64,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9477875Z #34 4.956 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9479742Z #34 4.956 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9481595Z #34 4.957 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9483355Z #34 4.957 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=1408,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9485045Z #34 4.957 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=176,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9486692Z #34 4.957 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=352,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9488347Z #34 4.957 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=704,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9490007Z #34 4.958 copying vllm/model_executor/layers/fused_moe/configs/E=62,N=256,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9491925Z #34 4.958 copying vllm/model_executor/layers/fused_moe/configs/E=62,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9493654Z #34 4.958 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9495384Z #34 4.958 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9497190Z #34 4.958 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9498972Z #34 4.959 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9500770Z #34 4.959 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9502455Z #34 4.959 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9504242Z #34 4.959 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9505978Z #34 4.960 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9507717Z #34 4.960 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9509369Z #34 4.960 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9511006Z #34 4.960 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9512636Z #34 4.961 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9514314Z #34 4.961 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9516069Z #34 4.961 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9517736Z #34 4.961 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9519374Z #34 4.961 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9521007Z #34 4.962 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9522621Z #34 4.962 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9524243Z #34 4.962 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9525913Z #34 4.962 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9527666Z #34 4.963 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9529452Z #34 4.963 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9531495Z #34 4.963 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9533216Z #34 4.963 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9534910Z #34 4.964 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9536597Z #34 4.964 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9538261Z #34 4.964 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9539878Z #34 4.964 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=896,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9541543Z #34 4.965 copying vllm/model_executor/layers/fused_moe/configs/E=72,N=384,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9543241Z #34 4.965 copying vllm/model_executor/layers/fused_moe/configs/E=72,N=768,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9545079Z #34 4.965 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9546803Z #34 4.965 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9548549Z #34 4.965 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9550709Z #34 4.966 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9552495Z #34 4.966 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9554297Z #34 4.966 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9556007Z #34 4.966 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9557755Z #34 4.966 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9559972Z #34 4.967 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9561864Z #34 4.967 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9563676Z #34 4.967 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9565386Z #34 4.967 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:54.9567104Z #34 4.968 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0295954Z #34 4.968 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0297832Z #34 4.968 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0299566Z #34 4.968 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0301310Z #34 4.969 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0303041Z #34 4.969 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0304832Z #34 4.969 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0306624Z #34 4.969 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0308320Z #34 4.969 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0310029Z #34 4.970 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0311748Z #34 4.970 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0313467Z #34 4.970 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0315122Z #34 4.970 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0316845Z #34 4.971 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0318619Z #34 4.971 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0320368Z #34 4.971 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0322257Z #34 4.971 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0323901Z #34 4.971 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0325571Z #34 4.972 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0327294Z #34 4.972 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0329015Z #34 4.972 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0330714Z #34 4.972 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0332700Z #34 4.973 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0334438Z #34 4.973 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0336220Z #34 4.973 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0338117Z #34 4.973 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0339897Z #34 4.973 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0341624Z #34 4.974 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0343430Z #34 4.974 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0345016Z #34 4.974 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_L40S.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0346686Z #34 4.974 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0348433Z #34 4.975 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0350517Z #34 4.975 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0352276Z #34 4.975 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0354093Z #34 4.975 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0355879Z #34 4.975 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0357641Z #34 4.976 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0359374Z #34 4.976 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0361086Z #34 4.976 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0362879Z #34 4.976 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0364594Z #34 4.977 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0366311Z #34 4.977 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0368012Z #34 4.977 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0369719Z #34 4.977 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0371687Z #34 4.978 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0373452Z #34 4.978 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0375188Z #34 4.978 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0376887Z #34 4.978 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0378606Z #34 4.978 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0380436Z #34 4.979 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0382209Z #34 4.979 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0384064Z #34 4.979 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0385843Z #34 4.979 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0387588Z #34 4.980 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0388894Z #34 4.980 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0390478Z #34 4.980 copying vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0392673Z #34 4.980 copying vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0394834Z #34 4.980 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0397032Z #34 4.981 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0399243Z #34 4.981 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0401493Z #34 4.981 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0403706Z #34 4.981 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0405916Z #34 4.982 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0408068Z #34 4.982 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0410175Z #34 4.982 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0412663Z #34 4.982 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0414930Z #34 4.982 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0417259Z #34 4.983 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0419550Z #34 4.983 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0421845Z #34 4.983 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0424208Z #34 4.983 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0426351Z #34 4.984 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0428456Z #34 4.984 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0430574Z #34 4.984 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0432675Z #34 4.984 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0434861Z #34 4.985 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0437044Z #34 4.985 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0439226Z #34 4.985 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0441425Z #34 4.985 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0443642Z #34 4.985 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0445872Z #34 4.986 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0448002Z #34 4.986 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0450534Z #34 4.986 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0452787Z #34 4.986 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0455014Z #34 4.986 copying vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0457230Z #34 4.987 copying vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0459466Z #34 4.987 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0461738Z #34 4.987 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0464070Z #34 4.987 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0466357Z #34 4.988 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0468582Z #34 4.988 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0470792Z #34 4.988 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0472933Z #34 4.988 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0475059Z #34 4.988 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0477178Z #34 4.989 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0479374Z #34 4.989 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0481551Z #34 4.989 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0483805Z #34 4.989 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0486011Z #34 4.990 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0488214Z #34 4.990 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0490456Z #34 4.990 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0492993Z #34 4.990 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0495237Z #34 4.990 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0497485Z #34 4.991 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0499784Z #34 4.991 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0501966Z #34 4.991 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0504241Z #34 4.991 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0506363Z #34 4.992 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0508495Z #34 4.992 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0510669Z #34 4.992 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0512845Z #34 4.992 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0515082Z #34 4.992 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0517273Z #34 4.993 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0519573Z #34 4.993 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0521773Z #34 4.993 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0523914Z #34 4.993 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0526010Z #34 4.994 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0528168Z #34 4.994 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0530363Z #34 4.994 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0532885Z #34 4.994 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0535125Z #34 4.995 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0537314Z #34 4.995 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0539488Z #34 4.995 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0541679Z #34 4.995 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0544020Z #34 4.995 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0546213Z #34 4.996 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0548457Z #34 4.996 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0551033Z #34 4.996 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0553263Z #34 4.996 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0555479Z #34 4.997 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0557680Z #34 4.997 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0559867Z #34 4.997 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0562179Z #34 4.997 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0564360Z #34 4.997 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0566620Z #34 4.998 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0568841Z #34 4.998 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0571120Z #34 4.998 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0573542Z #34 4.998 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0575768Z #34 4.999 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0578027Z #34 4.999 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0580218Z #34 4.999 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0582481Z #34 4.999 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0584738Z #34 4.999 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0586840Z #34 5.000 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0589008Z #34 5.000 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0591218Z #34 5.000 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0593416Z #34 5.000 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0595632Z #34 5.001 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0597836Z #34 5.001 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0599987Z #34 5.001 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0602171Z #34 5.001 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0604361Z #34 5.001 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0606516Z #34 5.002 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0608680Z #34 5.002 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0610824Z #34 5.002 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0613270Z #34 5.002 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0615483Z #34 5.003 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0617698Z #34 5.003 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0619919Z #34 5.003 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0622151Z #34 5.003 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0624482Z #34 5.004 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0626689Z #34 5.004 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0628849Z #34 5.004 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0631043Z #34 5.004 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0633201Z #34 5.004 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0635327Z #34 5.005 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0637426Z #34 5.005 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0639576Z #34 5.005 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0641789Z #34 5.005 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0643979Z #34 5.006 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0646174Z #34 5.006 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0648281Z #34 5.006 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0650760Z #34 5.006 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0652986Z #34 5.006 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0655202Z #34 5.007 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0657451Z #34 5.007 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0659716Z #34 5.007 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0661982Z #34 5.007 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0664388Z #34 5.007 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0666552Z #34 5.008 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0668687Z #34 5.008 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0670836Z #34 5.008 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0672950Z #34 5.008 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0675091Z #34 5.009 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0677192Z #34 5.009 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0679372Z #34 5.009 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0681528Z #34 5.009 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0683713Z #34 5.010 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0685918Z #34 5.010 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0688143Z #34 5.010 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0690361Z #34 5.010 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0692857Z #34 5.010 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0695084Z #34 5.011 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0697289Z #34 5.011 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0699482Z #34 5.011 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0701706Z #34 5.011 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0704071Z #34 5.012 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0706259Z #34 5.012 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0708520Z #34 5.012 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0710751Z #34 5.012 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0713003Z #34 5.013 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0715152Z #34 5.013 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0717259Z #34 5.013 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0719382Z #34 5.013 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0721523Z #34 5.013 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0723699Z #34 5.014 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0725883Z #34 5.014 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0728105Z #34 5.014 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0730323Z #34 5.014 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0732794Z #34 5.015 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0734993Z #34 5.015 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0737165Z #34 5.015 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0739422Z #34 5.015 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0741696Z #34 5.016 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0744101Z #34 5.016 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0746358Z #34 5.016 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0748592Z #34 5.016 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0751132Z #34 5.017 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0753379Z #34 5.017 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0755607Z #34 5.017 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0757805Z #34 5.017 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0759990Z #34 5.018 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0762326Z #34 5.018 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0764459Z #34 5.018 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0766617Z #34 5.018 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0768826Z #34 5.018 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0771104Z #34 5.019 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0773605Z #34 5.019 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0775922Z #34 5.019 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0778249Z #34 5.019 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0780497Z #34 5.019 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0782723Z #34 5.020 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0784992Z #34 5.020 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0787123Z #34 5.020 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0789239Z #34 5.020 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0791360Z #34 5.021 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0793558Z #34 5.021 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0795818Z #34 5.021 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0798027Z #34 5.021 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0800197Z #34 5.022 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0802359Z #34 5.022 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0804498Z #34 5.022 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0806645Z #34 5.022 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0808756Z #34 5.022 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0811010Z #34 5.023 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0813440Z #34 5.023 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0815710Z #34 5.023 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0817943Z #34 5.023 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0820158Z #34 5.024 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0822389Z #34 5.024 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0824643Z #34 5.024 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0826781Z #34 5.025 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0828932Z #34 5.025 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0831118Z #34 5.025 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0833293Z #34 5.025 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0835451Z #34 5.025 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0837563Z #34 5.026 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0839693Z #34 5.026 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0841783Z #34 5.026 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0843993Z #34 5.026 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0846188Z #34 5.026 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0848380Z #34 5.027 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0851014Z #34 5.027 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0853290Z #34 5.027 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0855562Z #34 5.027 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.0857003Z #34 5.028 creating build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn 2025-09-07T07:02:55.0857777Z #34 5.028 copying vllm/vllm_flash_attn/.gitkeep -> build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn 2025-09-07T07:02:55.0858775Z #34 5.028 copying vllm/distributed/kv_transfer/README.md -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer 2025-09-07T07:02:55.0860034Z #34 5.028 copying vllm/distributed/kv_transfer/disagg_prefill_workflow.jpg -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer 2025-09-07T07:02:55.0861646Z #34 5.029 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0863540Z #34 5.029 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0865292Z #34 5.029 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0867043Z #34 5.029 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0868776Z #34 5.030 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0870636Z #34 5.030 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0872362Z #34 5.030 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0874127Z #34 5.030 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0875915Z #34 5.031 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0877664Z #34 5.031 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0879395Z #34 5.031 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0881134Z #34 5.031 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0882873Z #34 5.032 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0884538Z #34 5.032 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0886219Z #34 5.032 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0887851Z #34 5.032 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0889432Z #34 5.033 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0891129Z #34 5.033 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0893047Z #34 5.033 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=352,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0894958Z #34 5.033 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0896921Z #34 5.034 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0898853Z #34 5.034 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0900633Z #34 5.034 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0902300Z #34 5.034 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0904160Z #34 5.035 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0905935Z #34 5.035 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0907563Z #34 5.035 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0909250Z #34 5.035 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0910986Z #34 5.036 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0912846Z #34 5.036 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0914746Z #34 5.036 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0916586Z #34 5.036 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0918450Z #34 5.037 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0920179Z #34 5.037 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0921927Z #34 5.037 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0923644Z #34 5.038 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0925229Z #34 5.038 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=96,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0926842Z #34 5.038 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0928515Z #34 5.038 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0930157Z #34 5.039 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0932022Z #34 5.039 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_H100.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0933714Z #34 5.039 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0935438Z #34 5.039 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0937210Z #34 5.039 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0939010Z #34 5.040 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0940825Z #34 5.040 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0942633Z #34 5.040 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0944519Z #34 5.040 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0946197Z #34 5.041 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0947864Z #34 5.041 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0949946Z #34 5.041 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0951820Z #34 5.041 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0953738Z #34 5.042 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3200,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0955591Z #34 5.042 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0957394Z #34 5.042 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0959180Z #34 5.042 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=6400,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0961031Z #34 5.043 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0962895Z #34 5.043 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0964675Z #34 5.043 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0966447Z #34 5.043 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=800,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0968249Z #34 5.044 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0969887Z #34 5.044 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0971729Z #34 5.044 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=320,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0973512Z #34 5.045 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0975444Z #34 5.045 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0977366Z #34 5.045 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0979291Z #34 5.045 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0981210Z #34 5.045 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0983121Z #34 5.046 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0985148Z #34 5.046 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0987004Z #34 5.046 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325X,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0988892Z #34 5.047 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0990862Z #34 5.047 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0992756Z #34 5.047 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0994658Z #34 5.047 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0996549Z #34 5.048 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.0998426Z #34 5.048 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1000382Z #34 5.048 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1002224Z #34 5.048 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1004125Z #34 5.049 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1006060Z #34 5.049 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1008002Z #34 5.049 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1009913Z #34 5.049 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1012013Z #34 5.050 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1013917Z #34 5.050 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1015881Z #34 5.050 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1017812Z #34 5.050 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1019712Z #34 5.050 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1021672Z #34 5.051 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1023792Z #34 5.051 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=64,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1025392Z #34 5.051 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1027128Z #34 5.051 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1029011Z #34 5.052 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1030834Z #34 5.052 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=1408,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1032429Z #34 5.052 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=176,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1034027Z #34 5.052 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=352,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1035623Z #34 5.053 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=704,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1037223Z #34 5.053 copying vllm/model_executor/layers/fused_moe/configs/E=62,N=256,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1038831Z #34 5.053 copying vllm/model_executor/layers/fused_moe/configs/E=62,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1040455Z #34 5.053 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1042110Z #34 5.054 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1043700Z #34 5.054 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1045316Z #34 5.054 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1046853Z #34 5.055 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1048365Z #34 5.055 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1050303Z #34 5.055 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1052167Z #34 5.055 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1053972Z #34 5.056 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1055744Z #34 5.056 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1057421Z #34 5.056 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1059112Z #34 5.056 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1060920Z #34 5.057 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1062783Z #34 5.057 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1064451Z #34 5.057 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1065953Z #34 5.057 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1067441Z #34 5.058 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1068928Z #34 5.058 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1070605Z #34 5.058 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1072422Z #34 5.058 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1074155Z #34 5.058 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1075987Z #34 5.059 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1077690Z #34 5.059 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1079363Z #34 5.059 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1081001Z #34 5.059 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1082624Z #34 5.060 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1084249Z #34 5.060 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1085908Z #34 5.060 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=896,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1087481Z #34 5.060 copying vllm/model_executor/layers/fused_moe/configs/E=72,N=384,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1089079Z #34 5.061 copying vllm/model_executor/layers/fused_moe/configs/E=72,N=768,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1091097Z #34 5.061 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1093032Z #34 5.061 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1094816Z #34 5.061 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1096596Z #34 5.062 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1098377Z #34 5.062 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1100179Z #34 5.062 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1101885Z #34 5.062 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1103724Z #34 5.063 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1105447Z #34 5.063 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1107594Z #34 5.063 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1109319Z #34 5.064 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1111045Z #34 5.064 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1112850Z #34 5.064 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1114504Z #34 5.064 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1116170Z #34 5.065 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1117816Z #34 5.065 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1119434Z #34 5.065 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1121048Z #34 5.065 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1122746Z #34 5.066 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1124330Z #34 5.066 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1125962Z #34 5.066 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1127626Z #34 5.066 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1129283Z #34 5.067 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1131020Z #34 5.067 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1132895Z #34 5.067 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1134665Z #34 5.067 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1136444Z #34 5.068 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.1138304Z #34 5.068 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2579855Z #34 5.068 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2581624Z #34 5.068 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2583521Z #34 5.069 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2585266Z #34 5.069 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2586986Z #34 5.069 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2588839Z #34 5.069 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2590502Z #34 5.070 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2592793Z #34 5.070 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2594600Z #34 5.070 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2596377Z #34 5.070 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2598165Z #34 5.070 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2599793Z #34 5.071 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2601376Z #34 5.071 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2602914Z #34 5.071 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_L40S.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2604550Z #34 5.072 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2606205Z #34 5.072 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2607874Z #34 5.072 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2609585Z #34 5.072 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2611445Z #34 5.073 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2613235Z #34 5.073 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2615002Z #34 5.073 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2616725Z #34 5.073 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2618422Z #34 5.074 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2620191Z #34 5.074 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2621946Z #34 5.074 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2623840Z #34 5.074 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2625537Z #34 5.075 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2627132Z #34 5.075 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2628798Z #34 5.075 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2630459Z #34 5.075 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2632077Z #34 5.075 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2633661Z #34 5.076 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2635287Z #34 5.076 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2636935Z #34 5.076 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2638604Z #34 5.076 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2640289Z #34 5.077 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2641938Z #34 5.077 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2643619Z #34 5.077 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2645103Z #34 5.077 copying vllm/model_executor/layers/fused_moe/configs/README -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T07:02:55.2646833Z #34 5.078 copying vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2649316Z #34 5.078 copying vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2651734Z #34 5.078 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2654006Z #34 5.078 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2656346Z #34 5.079 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2658630Z #34 5.079 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2660920Z #34 5.079 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2663489Z #34 5.079 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2665582Z #34 5.080 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2667634Z #34 5.080 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2669723Z #34 5.080 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2671899Z #34 5.081 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2674346Z #34 5.081 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2676684Z #34 5.081 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2678904Z #34 5.081 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2681122Z #34 5.082 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2683316Z #34 5.082 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2685441Z #34 5.082 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2687657Z #34 5.082 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2689735Z #34 5.083 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2692104Z #34 5.083 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2694339Z #34 5.083 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2696716Z #34 5.083 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2698996Z #34 5.084 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2701274Z #34 5.084 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2703647Z #34 5.084 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2705772Z #34 5.084 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2708265Z #34 5.085 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2710367Z #34 5.085 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2712519Z #34 5.085 copying vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2714682Z #34 5.085 copying vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2716862Z #34 5.086 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2719294Z #34 5.086 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2721499Z #34 5.086 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2723645Z #34 5.086 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2725786Z #34 5.087 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2727926Z #34 5.087 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2730013Z #34 5.087 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2732339Z #34 5.087 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2734524Z #34 5.088 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2736751Z #34 5.088 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2739010Z #34 5.088 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2741256Z #34 5.088 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2743732Z #34 5.089 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2745880Z #34 5.089 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2748032Z #34 5.089 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2750679Z #34 5.089 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2752939Z #34 5.090 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2755247Z #34 5.090 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2757477Z #34 5.090 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2759672Z #34 5.090 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2762048Z #34 5.091 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2763999Z #34 5.091 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2765942Z #34 5.091 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2767915Z #34 5.091 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2769933Z #34 5.092 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2772266Z #34 5.092 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2774542Z #34 5.092 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2776809Z #34 5.092 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2779081Z #34 5.093 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2781323Z #34 5.093 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2783618Z #34 5.093 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2785607Z #34 5.093 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2787814Z #34 5.094 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2789954Z #34 5.094 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2792039Z #34 5.094 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2794102Z #34 5.094 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2796163Z #34 5.095 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2798205Z #34 5.095 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2800375Z #34 5.095 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2802416Z #34 5.095 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2804437Z #34 5.096 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2806568Z #34 5.096 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2808834Z #34 5.096 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2811064Z #34 5.096 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2813458Z #34 5.097 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2815648Z #34 5.097 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2817928Z #34 5.097 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2820198Z #34 5.098 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2822458Z #34 5.098 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2824812Z #34 5.098 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2827032Z #34 5.098 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2829183Z #34 5.099 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2831336Z #34 5.099 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2833555Z #34 5.099 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2835624Z #34 5.099 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2837685Z #34 5.099 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2839736Z #34 5.100 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2841785Z #34 5.100 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2843874Z #34 5.100 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2846040Z #34 5.101 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2848189Z #34 5.101 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2851005Z #34 5.101 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2853249Z #34 5.101 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2855478Z #34 5.102 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2857714Z #34 5.102 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2859983Z #34 5.102 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2862210Z #34 5.102 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2864475Z #34 5.102 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2866609Z #34 5.103 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2868777Z #34 5.103 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2870883Z #34 5.103 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2873017Z #34 5.103 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2875181Z #34 5.104 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2877337Z #34 5.104 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2879570Z #34 5.104 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2881758Z #34 5.105 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2884041Z #34 5.105 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2886136Z #34 5.105 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2888216Z #34 5.105 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2890272Z #34 5.105 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2892633Z #34 5.106 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2894828Z #34 5.106 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2897078Z #34 5.106 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2899375Z #34 5.107 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2901592Z #34 5.107 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2903941Z #34 5.107 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2905986Z #34 5.107 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2908024Z #34 5.108 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2910108Z #34 5.108 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2912222Z #34 5.108 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2914345Z #34 5.108 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2916530Z #34 5.109 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2918674Z #34 5.109 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2920760Z #34 5.109 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2922847Z #34 5.109 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2924904Z #34 5.110 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2926943Z #34 5.110 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2928983Z #34 5.110 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2931108Z #34 5.110 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2933444Z #34 5.111 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2935671Z #34 5.111 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2937935Z #34 5.111 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2940204Z #34 5.111 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2942524Z #34 5.111 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2944871Z #34 5.112 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2947123Z #34 5.112 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2949622Z #34 5.112 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2951814Z #34 5.113 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2953988Z #34 5.113 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2956219Z #34 5.113 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2958481Z #34 5.113 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2960754Z #34 5.114 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2963075Z #34 5.114 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2965355Z #34 5.114 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2967565Z #34 5.115 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2969698Z #34 5.115 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2972054Z #34 5.115 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2974267Z #34 5.115 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2976540Z #34 5.116 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2978791Z #34 5.116 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2981137Z #34 5.116 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2983515Z #34 5.116 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2985971Z #34 5.117 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2988158Z #34 5.117 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2990302Z #34 5.117 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2992406Z #34 5.117 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2994555Z #34 5.118 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2996898Z #34 5.118 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.2999211Z #34 5.118 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3001436Z #34 5.118 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3003652Z #34 5.119 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3005841Z #34 5.119 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3008005Z #34 5.119 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3010180Z #34 5.119 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3012562Z #34 5.120 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3014833Z #34 5.120 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3017033Z #34 5.120 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3019211Z #34 5.120 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3021452Z #34 5.121 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3023816Z #34 5.121 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3025947Z #34 5.121 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3028108Z #34 5.121 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3030312Z #34 5.122 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3032413Z #34 5.122 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3034523Z #34 5.122 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3036615Z #34 5.122 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3038675Z #34 5.123 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3040925Z #34 5.123 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3043044Z #34 5.123 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3045197Z #34 5.123 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3047385Z #34 5.124 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3049894Z #34 5.124 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3052230Z #34 5.124 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3054448Z #34 5.124 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3056674Z #34 5.125 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3058905Z #34 5.125 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3061084Z #34 5.125 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3063394Z #34 5.125 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3065485Z #34 5.126 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3067610Z #34 5.126 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3069736Z #34 5.126 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3071839Z #34 5.126 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3073961Z #34 5.127 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3076051Z #34 5.127 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3078179Z #34 5.127 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3080230Z #34 5.127 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3082313Z #34 5.128 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3084412Z #34 5.128 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3086543Z #34 5.128 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3088638Z #34 5.128 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3090670Z #34 5.129 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3093101Z #34 5.129 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3095306Z #34 5.129 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3097519Z #34 5.129 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3099771Z #34 5.130 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3102058Z #34 5.130 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3104463Z #34 5.130 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3106483Z #34 5.130 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3108497Z #34 5.131 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3110302Z #34 5.131 copying vllm/model_executor/layers/quantization/utils/configs/README.md -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T07:02:55.3111516Z #34 5.131 copying vllm/plugins/lora_resolvers/README.md -> build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers 2025-09-07T07:02:55.3112624Z #34 5.131 copying vllm/transformers_utils/chat_templates/template_basic.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T07:02:55.3114119Z #34 5.132 copying vllm/transformers_utils/chat_templates/template_blip2.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T07:02:55.3115441Z #34 5.132 copying vllm/transformers_utils/chat_templates/template_chatml.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T07:02:55.3116813Z #34 5.132 copying vllm/transformers_utils/chat_templates/template_deepseek_vl2.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T07:02:55.3118167Z #34 5.132 copying vllm/transformers_utils/chat_templates/template_fuyu.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T07:02:55.3119509Z #34 5.132 copying vllm/transformers_utils/chat_templates/template_minicpmv45.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T07:02:55.3120372Z #34 5.146 running build_ext 2025-09-07T07:02:55.3124023Z #34 5.351 Using MAX_JOBS=42 as the number of jobs. 2025-09-07T07:02:55.4653280Z #34 5.353 Using NVCC_THREADS=4 as the number of nvcc threads. 2025-09-07T07:02:55.6012243Z #34 5.639 -- The CXX compiler identification is GNU 13.3.1 2025-09-07T07:02:55.7258463Z #34 5.653 -- Detecting CXX compiler ABI info 2025-09-07T07:02:55.7259011Z #34 5.764 -- Detecting CXX compiler ABI info - done 2025-09-07T07:02:55.9107490Z #34 5.783 -- Check for working CXX compiler: /opt/rh/gcc-toolset-13/root/usr/bin/c++ - skipped 2025-09-07T07:02:55.9108133Z #34 5.783 -- Detecting CXX compile features 2025-09-07T07:02:55.9108747Z #34 5.784 -- Detecting CXX compile features - done 2025-09-07T07:02:55.9109189Z #34 5.798 -- Build type: Release 2025-09-07T07:02:55.9109514Z #34 5.798 -- Target device: cuda 2025-09-07T07:02:55.9185612Z #34 5.957 -- Found Python: /opt/python/cp312-cp312/bin/python3 (found version "3.12.11") found components: Interpreter Development.Module Development.SABIModule 2025-09-07T07:02:56.0691890Z #34 5.957 -- Found python matching: /opt/python/cp312-cp312/bin/python3. 2025-09-07T07:02:57.7935046Z #34 7.832 -- Found CUDA: /usr/local/cuda (found version "12.9") 2025-09-07T07:02:58.8702233Z #34 8.908 -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.1 2025-09-07T07:02:59.0325827Z #34 8.920 -- Detecting CUDA compiler ABI info 2025-09-07T07:02:59.8819372Z #34 9.920 -- Detecting CUDA compiler ABI info - done 2025-09-07T07:03:00.0800879Z #34 9.983 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped 2025-09-07T07:03:00.0801463Z #34 9.986 -- Detecting CUDA compile features 2025-09-07T07:03:00.0801887Z #34 9.987 -- Detecting CUDA compile features - done 2025-09-07T07:03:00.0802436Z #34 10.00 -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.9.86") 2025-09-07T07:03:00.0802976Z #34 10.01 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD 2025-09-07T07:03:00.0803445Z #34 10.12 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed 2025-09-07T07:03:00.2520865Z #34 10.12 -- Looking for pthread_create in pthreads 2025-09-07T07:03:00.2521389Z #34 10.19 -- Looking for pthread_create in pthreads - not found 2025-09-07T07:03:00.2521858Z #34 10.19 -- Looking for pthread_create in pthread 2025-09-07T07:03:00.2522310Z #34 10.29 -- Looking for pthread_create in pthread - found 2025-09-07T07:03:00.4848482Z #34 10.29 -- Found Threads: TRUE 2025-09-07T07:03:00.4849435Z #34 10.39 -- PyTorch: CUDA detected: 12.9 2025-09-07T07:03:00.4850090Z #34 10.39 -- PyTorch: CUDA nvcc is: /usr/local/cuda/bin/nvcc 2025-09-07T07:03:00.4850597Z #34 10.39 -- PyTorch: CUDA toolkit directory: /usr/local/cuda 2025-09-07T07:03:00.4851126Z #34 10.52 -- PyTorch: Header version is: 12.9 2025-09-07T07:03:00.6826359Z #34 10.55 -- Found Python: /opt/python/cp312-cp312/bin/python3 (found version "3.12.11") found components: Interpreter 2025-09-07T07:03:00.6827599Z #34 10.55 CMake Warning at /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:140 (message): 2025-09-07T07:03:00.6828471Z #34 10.55 Failed to compute shorthash for libnvrtc.so 2025-09-07T07:03:00.6828907Z #34 10.55 Call Stack (most recent call first): 2025-09-07T07:03:00.6829644Z #34 10.55 /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) 2025-09-07T07:03:00.6830756Z #34 10.55 /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 2025-09-07T07:03:00.6831539Z #34 10.55 CMakeLists.txt:80 (find_package) 2025-09-07T07:03:00.6831879Z #34 10.55 2025-09-07T07:03:00.6832111Z #34 10.55 2025-09-07T07:03:00.6832438Z #34 10.55 -- USE_CUDNN is set to 0. Compiling without cuDNN support 2025-09-07T07:03:00.6833022Z #34 10.55 -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support 2025-09-07T07:03:00.6833577Z #34 10.55 -- USE_CUDSS is set to 0. Compiling without cuDSS support 2025-09-07T07:03:00.6834104Z #34 10.55 -- USE_CUFILE is set to 0. Compiling without cuFile support 2025-09-07T07:03:00.6834990Z #34 10.55 CMake Warning at /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:317 (message): 2025-09-07T07:03:00.6835958Z #34 10.55 pytorch is not compatible with `CMAKE_CUDA_ARCHITECTURES` and will ignore 2025-09-07T07:03:00.6836584Z #34 10.55 its value. Please configure `TORCH_CUDA_ARCH_LIST` instead. 2025-09-07T07:03:00.6837070Z #34 10.55 Call Stack (most recent call first): 2025-09-07T07:03:00.6837813Z #34 10.55 /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) 2025-09-07T07:03:00.6839232Z #34 10.55 /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 2025-09-07T07:03:00.6839985Z #34 10.55 CMakeLists.txt:80 (find_package) 2025-09-07T07:03:00.6840327Z #34 10.55 2025-09-07T07:03:00.6840534Z #34 10.55 2025-09-07T07:03:00.6841461Z #34 10.55 -- Added CUDA NVCC flags for: -gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_100,code=sm_100;-gencode;arch=compute_120,code=sm_120 2025-09-07T07:03:00.6842875Z #34 10.57 CMake Warning at /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): 2025-09-07T07:03:00.6843706Z #34 10.57 static library kineto_LIBRARY-NOTFOUND not found. 2025-09-07T07:03:00.6844150Z #34 10.57 Call Stack (most recent call first): 2025-09-07T07:03:00.6844944Z #34 10.57 /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found) 2025-09-07T07:03:00.6845756Z #34 10.57 CMakeLists.txt:80 (find_package) 2025-09-07T07:03:00.6846076Z #34 10.57 2025-09-07T07:03:00.6846296Z #34 10.57 2025-09-07T07:03:00.6846751Z #34 10.57 -- Found Torch: /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib/libtorch.so 2025-09-07T07:03:00.6847379Z #34 10.57 CMake Warning at CMakeLists.txt:112 (message): 2025-09-07T07:03:00.6847965Z #34 10.57 Pytorch version 2.8.0 expected for CUDA build, saw 2.9.0 instead. 2025-09-07T07:03:00.6848420Z #34 10.57 2025-09-07T07:03:00.6848634Z #34 10.57 2025-09-07T07:03:00.6849294Z #34 10.57 -- CUDA target architectures: 8.0;8.9;9.0;10.0;12.0 2025-09-07T07:03:00.6849926Z #34 10.57 -- CUDA supported target architectures: 8.0;8.9;9.0;10.0;12.0 2025-09-07T07:03:02.8479625Z #34 12.89 -- FetchContent base directory: /workspace/.deps 2025-09-07T07:03:02.9988062Z #34 12.89 -- Enabling cumem allocator extension. 2025-09-07T07:03:06.6952536Z #34 16.73 -- CMake Version: 4.1.0 2025-09-07T07:03:06.6952985Z #34 16.73 -- CUTLASS 4.0.0 2025-09-07T07:03:06.9233104Z #34 16.74 -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") 2025-09-07T07:03:06.9233834Z #34 16.81 -- CUDART: /usr/local/cuda/lib64/libcudart.so 2025-09-07T07:03:06.9234312Z #34 16.81 -- CUDA Driver: /usr/local/cuda/lib64/stubs/libcuda.so 2025-09-07T07:03:06.9234768Z #34 16.81 -- NVRTC: /usr/local/cuda/lib64/libnvrtc.so 2025-09-07T07:03:06.9235195Z #34 16.81 -- Default Install Location: install 2025-09-07T07:03:06.9341761Z #34 16.97 -- Found Python3: /opt/python/cp312-cp312/bin/python3.12 (found suitable version "3.12.11", minimum required is "3.5") found components: Interpreter 2025-09-07T07:03:07.0644621Z #34 17.10 -- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a;100;100a;120;120a;101;101a;100f;120f;101f 2025-09-07T07:03:07.0645414Z #34 17.10 -- Enable caching of reference results in conv unit tests 2025-09-07T07:03:07.0645947Z #34 17.10 -- Enable rigorous conv problem sizes in conv unit tests 2025-09-07T07:03:07.2804284Z #34 17.10 -- Grid Dependency Control (GDC) is enabled for SM100 kernels (required for programmatic dependent launches). 2025-09-07T07:03:07.2805046Z #34 17.10 -- Using the following NVCC flags: 2025-09-07T07:03:07.2805426Z #34 17.10 --expt-relaxed-constexpr 2025-09-07T07:03:07.2805797Z #34 17.10 -ftemplate-backtrace-limit=0 2025-09-07T07:03:07.2806169Z #34 17.10 -DCUTLASS_TEST_LEVEL=0 2025-09-07T07:03:07.2806532Z #34 17.10 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 2025-09-07T07:03:07.2806981Z #34 17.10 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 2025-09-07T07:03:07.2807393Z #34 17.10 -DCUTLASS_DEBUG_TRACE_LEVEL=0 2025-09-07T07:03:07.2807774Z #34 17.10 -DCUTLASS_SM100_FAMILY_ARCHS_ENABLED 2025-09-07T07:03:07.2808145Z #34 17.10 -Xcompiler=-Wconversion 2025-09-07T07:03:07.2808514Z #34 17.10 -Xcompiler=-fno-strict-aliasing 2025-09-07T07:03:07.2808869Z #34 17.15 -- Configuring cublas ... 2025-09-07T07:03:07.2809204Z #34 17.15 -- cuBLAS Disabled. 2025-09-07T07:03:07.2809668Z #34 17.15 -- Configuring cuBLAS ... done. 2025-09-07T07:03:07.2810161Z #34 17.17 -- Marlin generation script hash: abd33f08f337455f84516269e0f85ed7 2025-09-07T07:03:07.2810680Z #34 17.17 -- Last run Marlin generate script hash: 2025-09-07T07:03:08.1889019Z #34 18.23 -- Marlin generation completed successfully. 2025-09-07T07:03:08.3445763Z #34 18.23 -- Building Marlin kernels for archs: 8.0;8.7;9.0+PTX 2025-09-07T07:03:08.3446693Z #34 18.23 -- Building AllSpark kernels for archs: 8.0;8.9 2025-09-07T07:03:08.3447225Z #34 18.23 -- Building scaled_mm_c3x_sm90 for archs: 9.0a 2025-09-07T07:03:08.3447670Z #34 18.23 -- Building scaled_mm_c3x_sm120 for archs: 12.0a 2025-09-07T07:03:08.3448132Z #34 18.23 -- Building scaled_mm_c3x_sm100 for archs: 10.0a 2025-09-07T07:03:08.3448593Z #34 18.23 -- Building scaled_mm_c2x for archs: 8.0;8.9+PTX 2025-09-07T07:03:08.3449451Z #34 18.23 -- Building sparse_scaled_mm_c3x for archs: 9.0a 2025-09-07T07:03:08.3449955Z #34 18.23 -- Building NVFP4 for archs: 12.0a 2025-09-07T07:03:08.3450361Z #34 18.23 -- Building NVFP4 for archs: 10.0a 2025-09-07T07:03:08.3450754Z #34 18.23 -- Building CUTLASS MLA for archs: 10.0a 2025-09-07T07:03:08.3451278Z #34 18.23 -- Building grouped_mm_c3x for archs: 9.0a 2025-09-07T07:03:08.3451723Z #34 18.23 -- Building grouped_mm_c3x for archs: 10.0a 2025-09-07T07:03:08.3452147Z #34 18.23 -- Building moe_data for archs: 9.0a;10.0a 2025-09-07T07:03:08.3452816Z #34 18.23 -- Building blockwise_scaled_group_mm_sm100 for archs: 10.0a 2025-09-07T07:03:08.3453422Z #34 18.23 -- Machete generation script hash: 54d14089cd629a0eee221067f44a0b46 2025-09-07T07:03:08.3453971Z #34 18.23 -- Last run machete generate script hash: 2025-09-07T07:03:08.4900211Z #34 18.53 -- Machete generation completed successfully. 2025-09-07T07:03:08.6438388Z #34 18.53 -- Building Machete kernels for archs: 9.0a 2025-09-07T07:03:08.6439135Z #34 18.53 -- Building W4A8 kernels for archs: 9.0a 2025-09-07T07:03:08.6439539Z #34 18.53 -- Enabling C extension. 2025-09-07T07:03:08.6440038Z #34 18.53 -- Marlin MOE generation script hash: e42dc1ed5a7c83988cc21a1bf57c6b6d 2025-09-07T07:03:08.6440581Z #34 18.53 -- Last run Marlin MOE generate script hash: 2025-09-07T07:03:09.1414293Z #34 19.18 -- Marlin MOE generation completed successfully. 2025-09-07T07:03:09.2997251Z #34 19.18 -- Building Marlin MOE kernels for archs: 8.0;8.7;9.0+PTX 2025-09-07T07:03:09.2998095Z #34 19.18 -- Enabling moe extension. 2025-09-07T07:03:09.2999083Z #34 19.19 CMake Warning (dev) at /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:1564 (cmake_parse_arguments): 2025-09-07T07:03:09.3000212Z #34 19.19 The BUILD_COMMAND keyword was followed by an empty string or no value at 2025-09-07T07:03:09.3001097Z #34 19.19 all. Policy CMP0174 is not set, so cmake_parse_arguments() will unset the 2025-09-07T07:03:09.3001697Z #34 19.19 ARG_BUILD_COMMAND variable rather than setting it to an empty string. 2025-09-07T07:03:09.3002190Z #34 19.19 Call Stack (most recent call first): 2025-09-07T07:03:09.3003076Z #34 19.19 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2145:EVAL:2 (__FetchContent_doPopulation) 2025-09-07T07:03:09.3004376Z #34 19.19 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2145 (cmake_language) 2025-09-07T07:03:09.3005616Z #34 19.19 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2384 (__FetchContent_Populate) 2025-09-07T07:03:09.3006564Z #34 19.19 cmake/external_projects/flashmla.cmake:30 (FetchContent_MakeAvailable) 2025-09-07T07:03:09.3007064Z #34 19.19 CMakeLists.txt:942 (include) 2025-09-07T07:03:09.3007512Z #34 19.19 This warning is for project developers. Use -Wno-dev to suppress it. 2025-09-07T07:03:09.3007961Z #34 19.19 2025-09-07T07:03:09.3008906Z #34 19.19 CMake Warning (dev) at /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:1564 (cmake_parse_arguments): 2025-09-07T07:03:09.3009940Z #34 19.19 The CONFIGURE_COMMAND keyword was followed by an empty string or no value 2025-09-07T07:03:09.3010553Z #34 19.19 at all. Policy CMP0174 is not set, so cmake_parse_arguments() will unset 2025-09-07T07:03:09.3011237Z #34 19.19 the ARG_CONFIGURE_COMMAND variable rather than setting it to an empty 2025-09-07T07:03:09.3011908Z #34 19.19 string. 2025-09-07T07:03:09.3012201Z #34 19.19 Call Stack (most recent call first): 2025-09-07T07:03:09.3013380Z #34 19.19 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2145:EVAL:2 (__FetchContent_doPopulation) 2025-09-07T07:03:09.3014839Z #34 19.19 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2145 (cmake_language) 2025-09-07T07:03:09.3016219Z #34 19.19 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2384 (__FetchContent_Populate) 2025-09-07T07:03:09.3017397Z #34 19.19 cmake/external_projects/flashmla.cmake:30 (FetchContent_MakeAvailable) 2025-09-07T07:03:09.3018043Z #34 19.19 CMakeLists.txt:942 (include) 2025-09-07T07:03:09.3018521Z #34 19.19 This warning is for project developers. Use -Wno-dev to suppress it. 2025-09-07T07:03:09.3018987Z #34 19.19 2025-09-07T07:03:12.1650133Z #34 22.20 -- FlashMLA is available at /workspace/.deps/flashmla-src 2025-09-07T07:03:18.8060931Z #34 28.84 -- Build type: Release 2025-09-07T07:03:18.8061339Z #34 28.84 -- Target device: cuda 2025-09-07T07:03:19.0018541Z #34 28.89 -- Found Python: /opt/python/cp312-cp312/bin/python3 (found version "3.12.11") found components: Interpreter Development.Module Development.SABIModule 2025-09-07T07:03:20.6893700Z #34 30.73 CMake Warning at .deps/vllm-flash-attn-src/CMakeLists.txt:75 (message): 2025-09-07T07:03:20.6894424Z #34 30.73 Pytorch version 2.4.0 expected for CUDA build, saw 2.9.0 instead. 2025-09-07T07:03:20.6894917Z #34 30.73 2025-09-07T07:03:20.6895157Z #34 30.73 2025-09-07T07:03:20.8402817Z #34 30.73 -- CUDA target architectures: 8.0;8.9;9.0;10.0;12.0 2025-09-07T07:03:20.8403438Z #34 30.73 -- CUDA supported target architectures: 8.0;8.9;9.0;10.0;12.0 2025-09-07T07:03:22.9913768Z #34 33.03 -- FA2_ARCHS: 8.0+PTX 2025-09-07T07:03:23.1052482Z #34 33.04 -- FA3_ARCHS: 9.0a;8.0 2025-09-07T07:03:23.1053069Z #34 33.04 -- vllm-flash-attn is available at /workspace/.deps/vllm-flash-attn-src 2025-09-07T07:03:23.1053636Z #34 33.04 -- Configuring done (27.5s) 2025-09-07T07:03:23.1053985Z #34 33.13 -- Generating done (0.1s) 2025-09-07T07:03:23.1054535Z #34 33.13 -- Build files have been written to: /workspace/build/temp.linux-x86_64-cpython-312 2025-09-07T07:03:23.1055155Z #34 33.14 Using MAX_JOBS=42 as the number of jobs. 2025-09-07T07:03:23.2581113Z #34 33.15 Using NVCC_THREADS=4 as the number of nvcc threads. 2025-09-07T07:03:23.6565179Z #34 33.69 [1/510] Building CXX object CMakeFiles/cumem_allocator.dir/csrc/cumem_allocator.cpp.o 2025-09-07T07:08:00.2932101Z #34 310.3 [2/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/merge_attn_states.cu.o 2025-09-07T07:08:04.6685571Z #34 314.7 [3/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/vertical_slash_index.cu.o 2025-09-07T07:08:06.3297203Z #34 316.4 [4/510] Building CUDA object CMakeFiles/_C.dir/csrc/activation_kernels.cu.o 2025-09-07T07:08:06.7956079Z #34 316.8 [5/510] Building CUDA object CMakeFiles/_C.dir/csrc/pos_encoding_kernels.cu.o 2025-09-07T07:08:08.2452321Z #34 318.3 [6/510] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o 2025-09-07T07:08:09.3094570Z #34 319.3 [7/510] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_quant_kernels.cu.o 2025-09-07T07:08:22.1628099Z #34 332.2 [8/510] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_kernels.cu.o 2025-09-07T07:09:41.3562519Z #34 411.4 [9/510] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/mamba_ssm/selective_scan_fwd.cu.o 2025-09-07T07:12:39.9814600Z #34 590.0 [10/510] Building CUDA object CMakeFiles/_C.dir/csrc/cuda_view.cu.o 2025-09-07T07:12:41.0520107Z #34 591.1 [11/510] Building CUDA object CMakeFiles/_C.dir/csrc/cuda_utils_kernels.cu.o 2025-09-07T07:12:58.3105722Z #34 608.3 [12/510] Building CUDA object CMakeFiles/_C.dir/csrc/sampler.cu.o 2025-09-07T07:13:08.7323272Z #34 618.8 [13/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/compressed_tensors/int8_quant_kernels.cu.o 2025-09-07T07:13:09.8692524Z #34 619.9 [14/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/common.cu.o 2025-09-07T07:13:14.2540913Z #34 624.3 [15/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o 2025-09-07T07:13:15.0809468Z #34 625.1 [16/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fused_kernels/fused_layernorm_dynamic_per_token_quant.cu.o 2025-09-07T07:13:24.9564125Z #34 635.0 [17/510] Building CXX object CMakeFiles/_C.dir/csrc/torch_bindings.cpp.o 2025-09-07T07:13:42.6544098Z #34 652.7 [18/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gguf/gguf_kernel.cu.o 2025-09-07T07:14:27.6210599Z #34 697.7 [19/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/activation_kernels.cu.o 2025-09-07T07:14:36.6395215Z #34 706.7 [20/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/paged_attention_v1.cu.o 2025-09-07T07:14:36.9262004Z #34 707.0 [21/510] Building CXX object CMakeFiles/_C.dir/csrc/cutlass_extensions/common.cpp.o 2025-09-07T07:14:47.7061570Z #34 717.7 [22/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/paged_attention_v2.cu.o 2025-09-07T07:17:18.0545636Z #34 868.1 [23/510] Building CUDA object CMakeFiles/_C.dir/csrc/custom_all_reduce.cu.o 2025-09-07T07:17:31.7353402Z #34 881.8 [24/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_quant_entry.cu.o 2025-09-07T07:17:39.7594072Z #34 889.8 [25/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_scaled_mm_entry.cu.o 2025-09-07T07:17:40.6541990Z #34 890.7 [26/510] Building CUDA object CMakeFiles/_C.dir/csrc/permute_cols.cu.o 2025-09-07T07:17:42.1940732Z #34 892.2 [27/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/awq/gemm_kernels.cu.o 2025-09-07T07:17:43.8513414Z #34 893.9 [28/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu.o 2025-09-07T07:19:05.3465935Z #34 975.4 [29/510] Building CUDA object CMakeFiles/_C.dir/csrc/sparse/cutlass/sparse_scaled_mm_entry.cu.o 2025-09-07T07:19:08.0891115Z #34 978.1 [30/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/mla/cutlass_mla_entry.cu.o 2025-09-07T07:19:22.6927601Z #34 992.7 [31/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/per_token_group_quant.cu.o 2025-09-07T07:20:42.0088366Z #34 1072.0 [32/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_fp16_kfe2m1f.cu.o 2025-09-07T07:20:49.2924713Z #34 1079.3 [33/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_bf16_kfe2m1f.cu.o 2025-09-07T07:20:59.1563923Z #34 1089.2 [34/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_bf16_kfe4m3fn.cu.o 2025-09-07T07:21:35.6415932Z #34 1125.7 [35/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_bf16_ku4.cu.o 2025-09-07T07:21:58.3996299Z #34 1148.4 [36/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_bf16_ku4b8.cu.o 2025-09-07T07:22:07.3684687Z #34 1157.4 [37/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_bf16_ku8b128.cu.o 2025-09-07T07:22:39.2080787Z #34 1189.2 [38/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_fp16_kfe4m3fn.cu.o 2025-09-07T07:22:55.0006036Z #34 1205.0 [39/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_blockwise_moe_kernel.cu.o 2025-09-07T07:23:36.2676231Z #34 1246.3 [40/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_fp16_ku4.cu.o 2025-09-07T07:23:49.8918133Z #34 1259.9 [41/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_fp16_ku4b8.cu.o 2025-09-07T07:24:11.8650773Z #34 1281.9 [42/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin.cu.o 2025-09-07T07:24:29.5593187Z #34 1299.6 [43/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu.o 2025-09-07T07:24:38.7797563Z #34 1308.8 [44/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin_repack.cu.o 2025-09-07T07:25:05.5455417Z #34 1335.6 [45/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/awq_marlin_repack.cu.o 2025-09-07T07:25:09.3566607Z #34 1339.4 [46/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_allspark/allspark_repack.cu.o 2025-09-07T07:25:13.6575953Z #34 1343.7 [47/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_fp16_ku8b128.cu.o 2025-09-07T07:25:37.8219561Z #34 1367.9 [48/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c3x_sm90.cu.o 2025-09-07T07:25:49.7407953Z #34 1379.8 [49/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_allspark/allspark_qgemm_w8a16.cu.o 2025-09-07T07:26:57.3981363Z #34 1447.4 [50/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c3x_sm120.cu.o 2025-09-07T07:27:32.7460296Z #34 1482.8 [51/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c3x_sm100.cu.o 2025-09-07T07:28:49.3527680Z #34 1559.4 [52/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm90_fp8.cu.o 2025-09-07T07:29:53.0399234Z #34 1623.1 [53/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm120_fp8.cu.o 2025-09-07T07:29:56.7635118Z #34 1626.8 [54/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_sm120_fp8.cu.o 2025-09-07T07:30:54.4758080Z #34 1684.5 [55/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_sm90_int8.cu.o 2025-09-07T07:31:01.6746154Z #34 1691.7 [56/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_quant_kernels.cu.o 2025-09-07T07:31:01.6747390Z #34 1691.7 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I13__nv_bfloat16Lb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:31:01.6749231Z #34 1691.7 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I13__nv_bfloat16Lb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:31:01.6750670Z #34 1691.7 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I6__halfLb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:31:01.6752060Z #34 1691.7 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I6__halfLb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:31:40.5158397Z #34 1730.6 [57/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_azp_sm90_int8.cu.o 2025-09-07T07:32:04.4668127Z #34 1754.5 [58/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_sm90_fp8.cu.o 2025-09-07T07:32:27.6390364Z #34 1777.7 [59/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/activation_nvfp4_quant_fusion_kernels.cu.o 2025-09-07T07:32:27.6391743Z #34 1777.7 ptxas warning : Value of threads per SM for entry _ZN4vllm24silu_and_cvt_fp16_to_fp4I13__nv_bfloat16Lb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:32:27.6393348Z #34 1777.7 ptxas warning : Value of threads per SM for entry _ZN4vllm24silu_and_cvt_fp16_to_fp4I6__halfLb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:32:27.6394793Z #34 1777.7 ptxas warning : Value of threads per SM for entry _ZN4vllm24silu_and_cvt_fp16_to_fp4I13__nv_bfloat16Lb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:32:27.6396578Z #34 1777.7 ptxas warning : Value of threads per SM for entry _ZN4vllm24silu_and_cvt_fp16_to_fp4I6__halfLb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:32:35.4247006Z #34 1785.5 [60/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_sm100_fp8.cu.o 2025-09-07T07:33:24.5271795Z #34 1834.6 [61/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_experts_quant.cu.o 2025-09-07T07:33:24.5273993Z #34 1834.6 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I13__nv_bfloat16Lb0ELb1EEEviiPKT_PKfPjS7_S7_S7_i is out of range. .minnctapersm will be ignored 2025-09-07T07:33:24.5275542Z #34 1834.6 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I13__nv_bfloat16Lb0ELb0EEEviiPKT_PKfPjS7_S7_S7_i is out of range. .minnctapersm will be ignored 2025-09-07T07:33:24.5277012Z #34 1834.6 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I6__halfLb0ELb1EEEviiPKT_PKfPjS7_S7_S7_i is out of range. .minnctapersm will be ignored 2025-09-07T07:33:24.5278437Z #34 1834.6 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I6__halfLb0ELb0EEEviiPKT_PKfPjS7_S7_S7_i is out of range. .minnctapersm will be ignored 2025-09-07T07:34:42.9800044Z #34 1913.0 [62/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8.cu.o 2025-09-07T07:35:35.1721406Z #34 1965.2 [63/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/moe/moe_data.cu.o 2025-09-07T07:36:48.1686044Z #34 2038.2 [64/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_scaled_mm_sm120_kernels.cu.o 2025-09-07T07:37:28.5672228Z #34 2078.6 [65/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/mla/cutlass_mla_kernels.cu.o 2025-09-07T07:37:49.0034550Z #34 2099.0 [66/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_dispatch.cu.o 2025-09-07T07:37:58.9993142Z #34 2109.0 [67/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_scaled_mm_kernels.cu.o 2025-09-07T07:38:38.6719511Z #34 2148.7 [68/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/moe/blockwise_scaled_group_mm_sm100.cu.o 2025-09-07T07:39:49.7335884Z #34 2219.8 [69/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/moe/grouped_mm_c3x_sm100.cu.o 2025-09-07T07:39:59.2921178Z #34 2229.3 [70/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/mla/sm100_cutlass_mla_kernel.cu.o 2025-09-07T07:40:28.2564285Z #34 2258.3 [71/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/moe/grouped_mm_c3x_sm90.cu.o 2025-09-07T07:42:22.1357612Z #34 2372.2 [72/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o 2025-09-07T07:43:12.1683284Z #34 2422.2 [73/510] Building CUDA object CMakeFiles/_C.dir/csrc/sparse/cutlass/sparse_scaled_mm_c3x.cu.o 2025-09-07T07:44:49.0697939Z #34 2519.1 [74/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_prepack.cu.o 2025-09-07T07:45:16.7033585Z #34 2546.7 [75/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/machete_pytorch.cu.o 2025-09-07T07:45:16.7034996Z #34 2546.7 nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-09-07T07:45:18.3837688Z #34 2548.4 [76/510] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/torch_bindings.cpp.o 2025-09-07T07:46:30.8551400Z #34 2620.9 [77/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part1.cu.o 2025-09-07T07:47:32.8481128Z #34 2682.9 [78/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part2.cu.o 2025-09-07T07:47:43.6721577Z #34 2693.7 [79/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part3.cu.o 2025-09-07T07:47:48.7592751Z #34 2698.8 [80/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part4.cu.o 2025-09-07T07:48:57.9773643Z #34 2768.0 [81/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part5.cu.o 2025-09-07T07:49:11.6961090Z #34 2781.7 [82/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/moe_align_sum_kernels.cu.o 2025-09-07T07:49:32.4466164Z #34 2802.5 [83/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part7.cu.o 2025-09-07T07:49:44.4733639Z #34 2814.5 [84/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part6.cu.o 2025-09-07T07:49:53.7135458Z #34 2823.8 [85/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part8.cu.o 2025-09-07T07:50:44.8079274Z #34 2874.8 [86/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o 2025-09-07T07:51:56.4196103Z #34 2946.5 [87/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/moe_wna16.cu.o 2025-09-07T07:52:01.1270376Z #34 2951.2 [88/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/grouped_topk_kernels.cu.o 2025-09-07T07:52:02.6111178Z #34 2952.6 [89/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/permute_unpermute_kernels/moe_permute_unpermute_kernel.cu.o 2025-09-07T07:52:36.4231704Z #34 2986.5 [90/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_bf16_kfe2m1f.cu.o 2025-09-07T07:53:01.8687495Z #34 3011.9 [91/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_bf16_kfe4m3fn.cu.o 2025-09-07T07:53:23.1699544Z #34 3033.2 [92/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/moe_permute_unpermute_op.cu.o 2025-09-07T07:53:51.1929082Z #34 3061.2 [93/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_bf16_ku4.cu.o 2025-09-07T07:53:53.3314171Z #34 3063.4 [94/510] Building CXX object CMakeFiles/_flashmla_C.dir/.deps/flashmla-src/csrc/flash_api.cpp.o 2025-09-07T07:53:54.5142030Z #34 3064.6 [95/510] Building CUDA object CMakeFiles/_flashmla_C.dir/.deps/flashmla-src/csrc/kernels/get_mla_metadata.cu.o 2025-09-07T07:53:56.2513031Z #34 3066.3 [96/510] Building CUDA object CMakeFiles/_flashmla_C.dir/.deps/flashmla-src/csrc/kernels/mla_combine.cu.o 2025-09-07T07:53:58.0217651Z #34 3068.1 [97/510] Building CUDA object CMakeFiles/_flashmla_C.dir/.deps/flashmla-src/csrc/kernels/splitkv_mla.cu.o 2025-09-07T07:53:59.8067232Z #34 3069.8 [98/510] Building CUDA object CMakeFiles/_flashmla_C.dir/.deps/flashmla-src/csrc/kernels_fp8/flash_fwd_mla_fp8_sm90.cu.o 2025-09-07T07:54:01.7325500Z #34 3071.8 [99/510] Building CXX object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/flash_api.cpp.o 2025-09-07T07:54:03.6495068Z #34 3073.7 [100/510] Building CXX object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/flash_api_sparse.cpp.o 2025-09-07T07:54:05.5118961Z #34 3075.5 [101/510] Building CXX object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/flash_api_torch_lib.cpp.o 2025-09-07T07:54:08.0253911Z #34 3078.1 [102/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim128_bf16_causal_sm80.cu.o 2025-09-07T07:54:10.5486696Z #34 3080.6 [103/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim128_bf16_sm80.cu.o 2025-09-07T07:54:13.0650436Z #34 3083.1 [104/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim128_fp16_causal_sm80.cu.o 2025-09-07T07:54:15.5657304Z #34 3085.6 [105/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim128_fp16_sm80.cu.o 2025-09-07T07:54:18.0369678Z #34 3088.1 [106/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim160_bf16_causal_sm80.cu.o 2025-09-07T07:54:20.5124470Z #34 3090.6 [107/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim160_bf16_sm80.cu.o 2025-09-07T07:54:22.9755470Z #34 3093.0 [108/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim160_fp16_causal_sm80.cu.o 2025-09-07T07:54:23.6569453Z #34 3093.7 [109/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_bf16_ku4b8.cu.o 2025-09-07T07:54:25.4561401Z #34 3095.5 [110/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim160_fp16_sm80.cu.o 2025-09-07T07:54:26.1434762Z #34 3096.2 [111/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim192_bf16_causal_sm80.cu.o 2025-09-07T07:54:27.9559585Z #34 3098.0 [112/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim192_bf16_sm80.cu.o 2025-09-07T07:54:28.6322779Z #34 3098.7 [113/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim192_fp16_causal_sm80.cu.o 2025-09-07T07:54:30.4753400Z #34 3100.5 [114/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim192_fp16_sm80.cu.o 2025-09-07T07:54:31.1644272Z #34 3101.2 [115/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim256_bf16_causal_sm80.cu.o 2025-09-07T07:54:33.0047332Z #34 3103.0 [116/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim256_bf16_sm80.cu.o 2025-09-07T07:54:33.6940494Z #34 3103.7 [117/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim256_fp16_causal_sm80.cu.o 2025-09-07T07:54:35.5708763Z #34 3105.6 [118/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim256_fp16_sm80.cu.o 2025-09-07T07:54:36.2507125Z #34 3106.3 [119/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim32_bf16_causal_sm80.cu.o 2025-09-07T07:54:38.1312502Z #34 3108.2 [120/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim32_bf16_sm80.cu.o 2025-09-07T07:54:38.7955734Z #34 3108.8 [121/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim32_fp16_causal_sm80.cu.o 2025-09-07T07:54:40.6946038Z #34 3110.7 [122/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim32_fp16_sm80.cu.o 2025-09-07T07:54:41.3860713Z #34 3111.4 [123/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim64_bf16_causal_sm80.cu.o 2025-09-07T07:54:43.2873725Z #34 3113.3 [124/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim64_bf16_sm80.cu.o 2025-09-07T07:54:43.9281241Z #34 3114.0 [125/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim64_fp16_causal_sm80.cu.o 2025-09-07T07:54:45.8774900Z #34 3115.9 [126/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim64_fp16_sm80.cu.o 2025-09-07T07:54:46.5012203Z #34 3116.5 [127/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim96_bf16_causal_sm80.cu.o 2025-09-07T07:54:48.4307875Z #34 3118.5 [128/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim96_bf16_sm80.cu.o 2025-09-07T07:54:49.0905032Z #34 3119.1 [129/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim96_fp16_causal_sm80.cu.o 2025-09-07T07:54:51.0229520Z #34 3121.1 [130/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim96_fp16_sm80.cu.o 2025-09-07T07:54:51.6202411Z #34 3121.7 [131/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_sparse_hdim128_bf16_causal_sm80.cu.o 2025-09-07T07:54:53.7002731Z #34 3123.7 [132/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_sparse_hdim128_bf16_sm80.cu.o 2025-09-07T07:54:54.1804957Z #34 3124.2 [133/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_sparse_hdim128_fp16_causal_sm80.cu.o 2025-09-07T07:54:56.2775725Z #34 3126.3 [134/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_sparse_hdim128_fp16_sm80.cu.o 2025-09-07T07:54:56.7782950Z #34 3126.8 [135/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim128_bf16_causal_sm80.cu.o 2025-09-07T07:54:58.9255974Z #34 3129.0 [136/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim128_bf16_sm80.cu.o 2025-09-07T07:54:59.7807805Z #34 3129.8 [137/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_causal_sm80.cu.o 2025-09-07T07:55:01.6226946Z #34 3131.7 [138/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.cu.o 2025-09-07T07:55:02.7712622Z #34 3132.8 [139/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_causal_sm80.cu.o 2025-09-07T07:55:04.2548996Z #34 3134.3 [140/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.cu.o 2025-09-07T07:55:05.9572158Z #34 3136.0 [141/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim160_fp16_causal_sm80.cu.o 2025-09-07T07:55:06.8578497Z #34 3136.9 [142/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim160_fp16_sm80.cu.o 2025-09-07T07:55:08.8568100Z #34 3138.9 [143/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim192_bf16_causal_sm80.cu.o 2025-09-07T07:55:09.4497630Z #34 3139.5 [144/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim192_bf16_sm80.cu.o 2025-09-07T07:55:11.4217342Z #34 3141.5 [145/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim192_fp16_causal_sm80.cu.o 2025-09-07T07:55:12.0100561Z #34 3142.0 [146/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim192_fp16_sm80.cu.o 2025-09-07T07:55:13.9577781Z #34 3144.0 [147/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim256_bf16_causal_sm80.cu.o 2025-09-07T07:55:14.5773382Z #34 3144.6 [148/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim256_bf16_sm80.cu.o 2025-09-07T07:55:16.5079051Z #34 3146.5 [149/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim256_fp16_causal_sm80.cu.o 2025-09-07T07:55:17.1555946Z #34 3147.2 [150/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim256_fp16_sm80.cu.o 2025-09-07T07:55:19.0277865Z #34 3149.1 [151/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim32_bf16_causal_sm80.cu.o 2025-09-07T07:55:19.9221346Z #34 3150.0 [152/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim32_bf16_sm80.cu.o 2025-09-07T07:55:21.5808677Z #34 3151.6 [153/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim32_fp16_causal_sm80.cu.o 2025-09-07T07:55:22.5702594Z #34 3152.6 [154/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim32_fp16_sm80.cu.o 2025-09-07T07:55:23.3227313Z #34 3153.4 [155/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_fp16_kfe2m1f.cu.o 2025-09-07T07:55:24.1340047Z #34 3154.2 [156/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim64_bf16_causal_sm80.cu.o 2025-09-07T07:55:25.1885451Z #34 3155.2 [157/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim64_bf16_sm80.cu.o 2025-09-07T07:55:25.8679396Z #34 3155.9 [158/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim64_fp16_causal_sm80.cu.o 2025-09-07T07:55:26.6639707Z #34 3156.7 [159/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu.o 2025-09-07T07:55:27.6689732Z #34 3157.7 [160/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim96_bf16_causal_sm80.cu.o 2025-09-07T07:55:28.4331084Z #34 3158.5 [161/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim96_bf16_sm80.cu.o 2025-09-07T07:55:29.1526386Z #34 3159.2 [162/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim96_fp16_causal_sm80.cu.o 2025-09-07T07:55:30.2466789Z #34 3160.3 [163/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_bf16_ku8b128.cu.o 2025-09-07T07:55:30.4340363Z #34 3160.3 [164/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim96_fp16_sm80.cu.o 2025-09-07T07:55:36.0804660Z #34 3166.1 [165/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/flash_prepare_scheduler.cu.o 2025-09-07T07:55:40.8484017Z #34 3170.9 [166/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_fp16_kfe4m3fn.cu.o 2025-09-07T07:55:43.9392017Z #34 3174.0 [167/510] Building CXX object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/flash_api_torch_lib.cpp.o 2025-09-07T07:55:45.4015974Z #34 3175.4 [168/510] Building CXX object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/flash_api.cpp.o 2025-09-07T07:56:22.5820576Z #34 3212.6 [169/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_fp16_ku4.cu.o 2025-09-07T07:56:39.2160157Z #34 3229.3 [170/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/ops.cu.o 2025-09-07T07:57:06.4560407Z #34 3256.5 [171/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_fp16_ku4b8.cu.o 2025-09-07T07:57:39.3383822Z #34 3289.4 [172/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/flash_fwd_combine.cu.o 2025-09-07T07:57:43.5625511Z #34 3293.6 [173/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_fp16_ku8b128.cu.o 2025-09-07T07:57:53.7528662Z #34 3303.8 [174/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu.o 2025-09-07T08:01:26.5722760Z #34 3516.6 [175/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_sm90.cu.o 2025-09-07T08:01:35.8384929Z #34 3525.9 [176/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm90.cu.o 2025-09-07T08:01:37.4767485Z #34 3527.5 [177/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm90.cu.o 2025-09-07T08:01:40.3543675Z #34 3530.4 [178/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_softcap_sm90.cu.o 2025-09-07T08:02:08.0850341Z #34 3558.1 [179/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T08:02:23.2094526Z #34 3573.2 [180/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_softcap_sm90.cu.o 2025-09-07T08:02:43.6780539Z #34 3593.7 [181/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_packgqa_sm90.cu.o 2025-09-07T08:04:00.3126965Z #34 3670.4 [182/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T08:04:48.6034385Z #34 3718.6 [183/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm90.cu.o 2025-09-07T08:04:54.6704905Z #34 3724.7 [184/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_split_softcap_sm90.cu.o 2025-09-07T08:05:34.9887563Z #34 3765.0 [185/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm90.cu.o 2025-09-07T08:05:38.4021387Z #34 3768.4 [186/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_softcap_sm90.cu.o 2025-09-07T08:05:46.9931639Z #34 3777.0 [187/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm90.cu.o 2025-09-07T08:05:52.7041460Z #34 3782.7 [188/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_sm90.cu.o 2025-09-07T08:06:14.3256750Z #34 3804.4 [189/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T08:06:20.4566100Z #34 3810.5 [190/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_packgqa_sm90.cu.o 2025-09-07T08:07:24.9436173Z #34 3875.0 [191/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_softcap_sm90.cu.o 2025-09-07T08:07:28.6442872Z #34 3878.7 [192/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T08:09:40.3543730Z #34 4010.4 [193/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_split_sm90.cu.o 2025-09-07T08:09:47.3767567Z #34 4017.4 [194/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_sm90.cu.o 2025-09-07T08:09:47.6554430Z #34 4017.7 [195/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_split_softcap_sm90.cu.o 2025-09-07T08:10:20.5877725Z #34 4050.6 [196/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_packgqa_sm90.cu.o 2025-09-07T08:10:23.9524207Z #34 4054.0 [197/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_sm90.cu.o 2025-09-07T08:10:32.9446249Z #34 4063.0 [198/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_softcap_sm90.cu.o 2025-09-07T08:10:47.4913839Z #34 4077.5 [199/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_split_sm90.cu.o 2025-09-07T08:10:54.8468881Z #34 4084.9 [200/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_softcap_sm90.cu.o 2025-09-07T08:11:08.2010072Z #34 4098.2 [201/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T08:12:15.4631003Z #34 4165.5 [202/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T08:14:34.0215769Z #34 4304.1 [203/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_split_sm90.cu.o 2025-09-07T08:14:43.0202842Z #34 4313.1 [204/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_split_softcap_sm90.cu.o 2025-09-07T08:15:54.8444190Z #34 4384.9 [205/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_sm90.cu.o 2025-09-07T08:16:47.0080886Z #34 4437.0 [206/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_packgqa_sm90.cu.o 2025-09-07T08:17:06.0047298Z #34 4456.0 [207/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_softcap_sm90.cu.o 2025-09-07T08:17:13.0624118Z #34 4463.1 [208/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_sm90.cu.o 2025-09-07T08:17:20.1606719Z #34 4470.2 [209/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_softcap_sm90.cu.o 2025-09-07T08:17:35.5327138Z #34 4485.6 [210/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_split_sm90.cu.o 2025-09-07T08:17:49.2264314Z #34 4499.3 [211/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T08:18:03.7325202Z #34 4513.8 [212/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T08:20:35.6669415Z #34 4665.7 [213/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_packgqa_sm90.cu.o 2025-09-07T08:20:41.6388669Z #34 4671.7 [214/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_sm90.cu.o 2025-09-07T08:21:01.2414502Z #34 4691.3 [215/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_softcap_sm90.cu.o 2025-09-07T08:21:06.6901372Z #34 4696.7 [216/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_sm90.cu.o 2025-09-07T08:21:12.7484428Z #34 4702.8 [217/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_split_sm90.cu.o 2025-09-07T08:21:20.6529411Z #34 4710.7 [218/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T08:21:33.7721907Z #34 4723.8 [219/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_softcap_sm90.cu.o 2025-09-07T08:21:48.7575817Z #34 4738.8 [220/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_split_sm90.cu.o 2025-09-07T08:21:50.3179634Z #34 4740.4 [221/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_split_softcap_sm90.cu.o 2025-09-07T08:22:31.9296480Z #34 4782.0 [222/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T08:24:22.1542381Z #34 4892.2 [223/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_sm90.cu.o 2025-09-07T08:24:31.7646576Z #34 4901.8 [224/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_softcap_sm90.cu.o 2025-09-07T08:24:35.3798082Z #34 4905.4 [225/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_split_sm90.cu.o 2025-09-07T08:24:49.1046493Z #34 4919.1 [226/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T08:24:52.0942158Z #34 4922.1 [227/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_packgqa_sm90.cu.o 2025-09-07T08:25:15.9401358Z #34 4946.0 [228/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_sm90.cu.o 2025-09-07T08:25:26.2540299Z #34 4956.3 [229/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_split_sm90.cu.o 2025-09-07T08:25:33.0033143Z #34 4963.0 [230/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_split_softcap_sm90.cu.o 2025-09-07T08:25:35.4003339Z #34 4965.4 [231/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T08:25:57.2043266Z #34 4987.2 [232/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_softcap_sm90.cu.o 2025-09-07T08:28:15.6530019Z #34 5125.7 [233/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_split_sm90.cu.o 2025-09-07T08:28:27.2029829Z #34 5137.2 [234/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_split_softcap_sm90.cu.o 2025-09-07T08:28:57.4130487Z #34 5167.5 [235/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_packgqa_sm90.cu.o 2025-09-07T08:28:57.4143749Z #34 5167.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:28:57.4167203Z #34 5167.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:28:57.4190432Z #34 5167.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:28:57.4214219Z #34 5167.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:29:18.4165232Z #34 5188.5 [236/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_paged_sm90.cu.o 2025-09-07T08:29:18.4172427Z #34 5188.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1P_S1Q_S1R_' 2025-09-07T08:29:18.4185759Z #34 5188.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE0_NS3_ILb1EEES1W_EEDaiS1P_S1Q_S1R_' 2025-09-07T08:29:18.4194764Z #34 5188.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb0ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:29:18.4201578Z #34 5188.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:29:26.2109858Z #34 5196.2 [237/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_paged_softcap_sm90.cu.o 2025-09-07T08:29:26.2121888Z #34 5196.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1P_S1Q_S1R_' 2025-09-07T08:29:26.2144135Z #34 5196.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE0_NS3_ILb1EEES1W_EEDaiS1P_S1Q_S1R_' 2025-09-07T08:29:26.2162165Z #34 5196.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb0ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:29:26.2175923Z #34 5196.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:29:31.8336259Z #34 5201.9 [238/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_sm90.cu.o 2025-09-07T08:29:31.8344763Z #34 5201.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:29:31.8357391Z #34 5201.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:29:31.8369451Z #34 5201.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:29:31.8381463Z #34 5201.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:29:52.9862756Z #34 5223.0 [239/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_paged_split_sm90.cu.o 2025-09-07T08:29:52.9869054Z #34 5223.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1P_S1Q_S1R_' 2025-09-07T08:29:52.9880444Z #34 5223.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE0_NS3_ILb1EEES1W_EEDaiS1P_S1Q_S1R_' 2025-09-07T08:29:52.9889340Z #34 5223.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:29:52.9896633Z #34 5223.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:29:54.2657795Z #34 5224.3 [240/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_softcap_sm90.cu.o 2025-09-07T08:29:54.2664315Z #34 5224.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:29:54.2676311Z #34 5224.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:29:54.2688091Z #34 5224.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:29:54.2699949Z #34 5224.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:29:56.0951518Z #34 5226.1 [241/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T08:29:56.0958257Z #34 5226.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:29:56.0970253Z #34 5226.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:29:56.0982132Z #34 5226.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:29:56.0993723Z #34 5226.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:30:01.5634921Z #34 5231.6 [242/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T08:30:01.5641413Z #34 5231.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1P_S1Q_S1R_' 2025-09-07T08:30:01.5653206Z #34 5231.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE0_NS3_ILb1EEES1W_EEDaiS1P_S1Q_S1R_' 2025-09-07T08:30:01.5662471Z #34 5231.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:30:01.5669513Z #34 5231.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:32:42.2764453Z #34 5392.3 [243/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_split_sm90.cu.o 2025-09-07T08:32:42.2771373Z #34 5392.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:32:42.2783266Z #34 5392.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:32:42.2794641Z #34 5392.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:32:42.2806068Z #34 5392.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:32:55.8391227Z #34 5405.9 [244/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_split_softcap_sm90.cu.o 2025-09-07T08:32:55.8399535Z #34 5405.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:32:55.8411676Z #34 5405.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:32:55.8423474Z #34 5405.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T08:32:55.8435190Z #34 5405.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T08:33:10.4931004Z #34 5420.5 [245/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_packgqa_sm90.cu.o 2025-09-07T08:33:36.0216344Z #34 5446.1 [246/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_paged_sm90.cu.o 2025-09-07T08:33:39.8078081Z #34 5449.8 [247/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_sm90.cu.o 2025-09-07T08:33:45.5545044Z #34 5455.6 [248/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_softcap_sm90.cu.o 2025-09-07T08:33:47.1664753Z #34 5457.2 [249/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_paged_softcap_sm90.cu.o 2025-09-07T08:34:02.4624891Z #34 5472.5 [250/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_paged_split_sm90.cu.o 2025-09-07T08:34:07.7460454Z #34 5477.8 [251/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T08:34:23.3180737Z #34 5493.4 [252/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T08:36:37.9450041Z #34 5628.0 [253/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm80.cu.o 2025-09-07T08:36:38.2499363Z #34 5628.3 [254/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_softcap_sm80.cu.o 2025-09-07T08:37:01.8288758Z #34 5651.9 [255/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_split_sm90.cu.o 2025-09-07T08:37:15.9337154Z #34 5666.0 [256/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_split_softcap_sm90.cu.o 2025-09-07T08:37:44.0019175Z #34 5694.0 [257/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:38:06.9721936Z #34 5717.0 [258/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm80.cu.o 2025-09-07T08:38:33.4384998Z #34 5743.5 [259/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_softcap_sm80.cu.o 2025-09-07T08:38:53.2576859Z #34 5763.3 [260/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_softcap_sm80.cu.o 2025-09-07T08:39:25.0735575Z #34 5795.1 [261/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm80.cu.o 2025-09-07T08:39:27.7380346Z #34 5797.8 [262/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_split_softcap_sm80.cu.o 2025-09-07T08:40:08.3357233Z #34 5838.4 [263/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_softcapall_sm80.cu.o 2025-09-07T08:40:28.5581808Z #34 5858.6 [264/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_softcapall_sm80.cu.o 2025-09-07T08:40:29.2583011Z #34 5859.3 [265/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm80.cu.o 2025-09-07T08:40:33.9497300Z #34 5864.0 [266/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_split_softcapall_sm80.cu.o 2025-09-07T08:40:57.6481319Z #34 5887.7 [267/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_softcap_sm80.cu.o 2025-09-07T08:42:02.2123804Z #34 5952.3 [268/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm80.cu.o 2025-09-07T08:42:23.3007765Z #34 5973.3 [269/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_softcap_sm80.cu.o 2025-09-07T08:42:47.0668545Z #34 5997.1 [270/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_softcapall_sm80.cu.o 2025-09-07T08:43:02.8378046Z #34 6012.9 [271/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_softcap_sm80.cu.o 2025-09-07T08:43:49.7451494Z #34 6059.8 [272/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_split_sm80.cu.o 2025-09-07T08:43:52.2639651Z #34 6062.3 [273/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_split_softcap_sm80.cu.o 2025-09-07T08:44:39.4765962Z #34 6109.5 [274/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:45:00.8656945Z #34 6130.9 [275/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_softcapall_sm80.cu.o 2025-09-07T08:45:09.4387008Z #34 6139.5 [276/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_sm80.cu.o 2025-09-07T08:45:27.4871178Z #34 6157.5 [277/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_split_softcapall_sm80.cu.o 2025-09-07T08:46:12.8846339Z #34 6202.9 [278/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_softcap_sm80.cu.o 2025-09-07T08:46:45.4206095Z #34 6235.5 [279/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_split_sm80.cu.o 2025-09-07T08:47:03.0684185Z #34 6253.1 [280/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_split_softcap_sm80.cu.o 2025-09-07T08:47:58.6465987Z #34 6308.7 [281/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_softcap_sm80.cu.o 2025-09-07T08:48:02.4459250Z #34 6312.5 [282/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_softcapall_sm80.cu.o 2025-09-07T08:48:55.1327702Z #34 6365.2 [283/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_split_sm80.cu.o 2025-09-07T08:48:55.8602487Z #34 6365.9 [284/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_sm80.cu.o 2025-09-07T08:49:01.3489021Z #34 6371.4 [285/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_split_softcap_sm80.cu.o 2025-09-07T08:49:30.3776811Z #34 6400.4 [286/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:49:32.4166468Z #34 6402.5 [287/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_softcap_sm80.cu.o 2025-09-07T08:50:04.7954963Z #34 6434.8 [288/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_softcapall_sm80.cu.o 2025-09-07T08:50:27.5489135Z #34 6457.6 [289/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_split_sm80.cu.o 2025-09-07T08:50:30.7163927Z #34 6460.8 [290/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_softcapall_sm80.cu.o 2025-09-07T08:50:34.4308240Z #34 6464.5 [291/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_split_softcap_sm80.cu.o 2025-09-07T08:50:50.0275900Z #34 6480.1 [292/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_split_softcapall_sm80.cu.o 2025-09-07T08:51:39.1012792Z #34 6529.1 [293/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_softcap_sm80.cu.o 2025-09-07T08:51:55.2918888Z #34 6545.3 [294/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_split_sm80.cu.o 2025-09-07T08:51:56.9987394Z #34 6547.0 [295/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_split_softcap_sm80.cu.o 2025-09-07T08:52:13.0496524Z #34 6563.1 [296/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:52:25.3275036Z #34 6575.4 [297/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_softcapall_sm80.cu.o 2025-09-07T08:53:12.6569331Z #34 6622.7 [298/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_split_softcapall_sm80.cu.o 2025-09-07T08:53:18.7953633Z #34 6628.8 [299/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_sm80.cu.o 2025-09-07T08:53:24.2687823Z #34 6634.3 [300/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_softcap_sm80.cu.o 2025-09-07T08:53:25.1024709Z #34 6635.1 [301/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_split_sm80.cu.o 2025-09-07T08:54:10.8804877Z #34 6680.9 [302/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_softcapall_sm80.cu.o 2025-09-07T08:54:21.6991267Z #34 6691.7 [303/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_split_softcap_sm80.cu.o 2025-09-07T08:54:44.1560902Z #34 6714.2 [304/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_softcap_sm80.cu.o 2025-09-07T08:54:55.4922376Z #34 6725.5 [305/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_split_sm80.cu.o 2025-09-07T08:55:20.8123501Z #34 6750.9 [306/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:55:40.6441894Z #34 6770.7 [307/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_softcapall_sm80.cu.o 2025-09-07T08:55:46.2994378Z #34 6776.3 [308/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_split_softcap_sm80.cu.o 2025-09-07T08:56:36.8858621Z #34 6826.9 [309/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_split_softcapall_sm80.cu.o 2025-09-07T08:59:05.0644619Z #34 6975.1 [310/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm90.cu.o 2025-09-07T08:59:38.0336085Z #34 7008.1 [311/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_sm90.cu.o 2025-09-07T08:59:56.0291582Z #34 7026.1 [312/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_softcap_sm90.cu.o 2025-09-07T09:00:02.2923555Z #34 7032.3 [313/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm90.cu.o 2025-09-07T09:00:16.3279255Z #34 7046.4 [314/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_packgqa_sm90.cu.o 2025-09-07T09:00:24.2840876Z #34 7054.3 [315/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_softcap_sm90.cu.o 2025-09-07T09:00:31.9221731Z #34 7062.0 [316/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T09:02:15.4856981Z #34 7165.5 [317/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T09:02:48.5002412Z #34 7198.5 [318/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm90.cu.o 2025-09-07T09:03:30.9165729Z #34 7241.0 [319/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_split_softcap_sm90.cu.o 2025-09-07T09:03:35.8710229Z #34 7245.9 [320/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_sm90.cu.o 2025-09-07T09:03:49.0701076Z #34 7259.1 [321/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_packgqa_sm90.cu.o 2025-09-07T09:03:50.3926085Z #34 7260.4 [322/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_sm90.cu.o 2025-09-07T09:04:01.7234121Z #34 7271.8 [323/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_softcap_sm90.cu.o 2025-09-07T09:04:07.8111188Z #34 7277.8 [324/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_sm90.cu.o 2025-09-07T09:04:19.2669520Z #34 7289.3 [325/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T09:05:15.9788637Z #34 7346.0 [326/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T09:05:39.3601283Z #34 7369.4 [327/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_softcap_sm90.cu.o 2025-09-07T09:07:37.2447911Z #34 7487.3 [328/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_split_sm90.cu.o 2025-09-07T09:07:42.5672891Z #34 7492.6 [329/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_sm90.cu.o 2025-09-07T09:08:25.0515026Z #34 7535.1 [330/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_split_softcap_sm90.cu.o 2025-09-07T09:08:25.2414472Z #34 7535.3 [331/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_packgqa_sm90.cu.o 2025-09-07T09:08:39.0101377Z #34 7549.0 [332/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_sm90.cu.o 2025-09-07T09:08:39.5832776Z #34 7549.6 [333/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_softcap_sm90.cu.o 2025-09-07T09:08:52.0123708Z #34 7562.1 [334/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_split_sm90.cu.o 2025-09-07T09:08:59.7894085Z #34 7569.8 [335/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_softcap_sm90.cu.o 2025-09-07T09:09:06.9151321Z #34 7577.0 [336/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T09:09:55.7719161Z #34 7625.8 [337/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T09:12:26.4446995Z #34 7776.5 [338/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_split_sm90.cu.o 2025-09-07T09:12:34.7079492Z #34 7784.7 [339/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_split_softcap_sm90.cu.o 2025-09-07T09:13:57.5790535Z #34 7867.6 [340/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_sm90.cu.o 2025-09-07T09:14:45.6780297Z #34 7915.7 [341/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_softcap_sm90.cu.o 2025-09-07T09:15:15.2941785Z #34 7945.3 [342/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_sm90.cu.o 2025-09-07T09:15:27.8424022Z #34 7957.9 [343/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_packgqa_sm90.cu.o 2025-09-07T09:15:37.0546083Z #34 7967.1 [344/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_softcap_sm90.cu.o 2025-09-07T09:15:43.1807305Z #34 7973.2 [345/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_split_sm90.cu.o 2025-09-07T09:15:54.3345838Z #34 7984.4 [346/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T09:15:59.1939237Z #34 7989.2 [347/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T09:18:34.9888327Z #34 8145.0 [348/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_packgqa_sm90.cu.o 2025-09-07T09:18:39.5568112Z #34 8149.6 [349/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_sm90.cu.o 2025-09-07T09:19:11.4296943Z #34 8181.5 [350/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_softcap_sm90.cu.o 2025-09-07T09:19:14.8738047Z #34 8184.9 [351/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_sm90.cu.o 2025-09-07T09:19:28.1949923Z #34 8198.2 [352/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_split_sm90.cu.o 2025-09-07T09:19:28.5039160Z #34 8198.5 [353/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_softcap_sm90.cu.o 2025-09-07T09:19:29.2447633Z #34 8199.3 [354/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_split_sm90.cu.o 2025-09-07T09:19:34.1999325Z #34 8204.2 [355/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_split_softcap_sm90.cu.o 2025-09-07T09:19:39.3765585Z #34 8209.4 [356/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T09:20:45.6526891Z #34 8275.7 [357/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T09:22:29.2477430Z #34 8379.3 [358/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_sm90.cu.o 2025-09-07T09:22:39.4776594Z #34 8389.5 [359/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_softcap_sm90.cu.o 2025-09-07T09:22:41.7961519Z #34 8391.8 [360/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_split_sm90.cu.o 2025-09-07T09:22:44.9664285Z #34 8395.0 [361/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T09:23:02.8860323Z #34 8412.9 [362/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_packgqa_sm90.cu.o 2025-09-07T09:23:03.4544466Z #34 8413.5 [363/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_sm90.cu.o 2025-09-07T09:23:25.4396391Z #34 8435.5 [364/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_split_sm90.cu.o 2025-09-07T09:23:25.8589512Z #34 8435.9 [365/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T09:23:29.9971397Z #34 8440.0 [366/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_split_softcap_sm90.cu.o 2025-09-07T09:24:11.3840812Z #34 8481.4 [367/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_softcap_sm90.cu.o 2025-09-07T09:26:21.2587020Z #34 8611.3 [368/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_split_sm90.cu.o 2025-09-07T09:26:37.8639508Z #34 8627.9 [369/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_split_softcap_sm90.cu.o 2025-09-07T09:27:02.8121329Z #34 8652.8 [370/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_packgqa_sm90.cu.o 2025-09-07T09:27:02.8133735Z #34 8652.8 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:27:02.8156295Z #34 8652.8 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:27:02.8178911Z #34 8652.8 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:27:02.8200936Z #34 8652.8 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:27:15.6062559Z #34 8665.6 [371/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_paged_sm90.cu.o 2025-09-07T09:27:15.6074142Z #34 8665.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1Q_S1R_S1S_' 2025-09-07T09:27:15.6089071Z #34 8665.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE0_NS3_ILb1EEES1X_EEDaiS1Q_S1R_S1S_' 2025-09-07T09:27:15.6102957Z #34 8665.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb0ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T09:27:15.6116317Z #34 8665.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T09:27:22.9514677Z #34 8673.0 [372/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_sm90.cu.o 2025-09-07T09:27:22.9521368Z #34 8673.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:27:22.9533341Z #34 8673.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:27:22.9545211Z #34 8673.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:27:22.9557406Z #34 8673.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:27:36.6419245Z #34 8686.7 [373/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_paged_softcap_sm90.cu.o 2025-09-07T09:27:36.6425870Z #34 8686.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1Q_S1R_S1S_' 2025-09-07T09:27:36.6437016Z #34 8686.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE0_NS3_ILb1EEES1X_EEDaiS1Q_S1R_S1S_' 2025-09-07T09:27:36.6447927Z #34 8686.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb0ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T09:27:36.6455466Z #34 8686.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T09:27:43.1851568Z #34 8693.2 [374/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_paged_split_sm90.cu.o 2025-09-07T09:27:43.1863720Z #34 8693.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1Q_S1R_S1S_' 2025-09-07T09:27:43.1885152Z #34 8693.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE0_NS3_ILb1EEES1X_EEDaiS1Q_S1R_S1S_' 2025-09-07T09:27:43.1902296Z #34 8693.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T09:27:43.1915585Z #34 8693.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T09:27:48.1680529Z #34 8698.2 [375/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T09:27:48.1693596Z #34 8698.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:27:48.1715412Z #34 8698.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:27:48.1731437Z #34 8698.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:27:48.1754485Z #34 8698.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:28:01.5189323Z #34 8711.6 [376/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T09:28:01.5198929Z #34 8711.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1Q_S1R_S1S_' 2025-09-07T09:28:01.5215947Z #34 8711.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE0_NS3_ILb1EEES1X_EEDaiS1Q_S1R_S1S_' 2025-09-07T09:28:01.5230629Z #34 8711.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T09:28:01.5242009Z #34 8711.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T09:28:07.8400573Z #34 8717.9 [377/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_softcap_sm90.cu.o 2025-09-07T09:28:07.8412590Z #34 8717.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:28:07.8431772Z #34 8717.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:28:07.8447714Z #34 8717.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:28:07.8464024Z #34 8717.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:30:48.1921566Z #34 8878.2 [378/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_split_sm90.cu.o 2025-09-07T09:30:48.1928134Z #34 8878.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:30:48.1940216Z #34 8878.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:30:48.1952352Z #34 8878.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:30:48.1964189Z #34 8878.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:31:05.8984033Z #34 8895.9 [379/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_split_softcap_sm90.cu.o 2025-09-07T09:31:05.8990839Z #34 8895.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:31:05.9004302Z #34 8895.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:31:05.9016224Z #34 8895.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T09:31:05.9027966Z #34 8895.9 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T09:31:16.2091497Z #34 8906.2 [380/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_packgqa_sm90.cu.o 2025-09-07T09:31:34.7411691Z #34 8924.8 [381/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_sm90.cu.o 2025-09-07T09:31:37.4011659Z #34 8927.4 [382/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_paged_sm90.cu.o 2025-09-07T09:31:44.5968038Z #34 8934.6 [383/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_paged_softcap_sm90.cu.o 2025-09-07T09:31:51.4790603Z #34 8941.5 [384/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_softcap_sm90.cu.o 2025-09-07T09:32:06.6640802Z #34 8956.7 [385/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_paged_split_sm90.cu.o 2025-09-07T09:32:12.2354924Z #34 8962.3 [386/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T09:32:14.3396138Z #34 8964.4 [387/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T09:34:36.7158140Z #34 9106.8 [388/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm80.cu.o 2025-09-07T09:34:44.0105825Z #34 9114.0 [389/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_softcap_sm80.cu.o 2025-09-07T09:35:08.5288939Z #34 9138.6 [390/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_split_sm90.cu.o 2025-09-07T09:35:27.5633184Z #34 9157.6 [391/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_split_softcap_sm90.cu.o 2025-09-07T09:35:44.6560126Z #34 9174.7 [392/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_softcapall_sm80.cu.o 2025-09-07T09:36:13.4693366Z #34 9203.5 [393/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm80.cu.o 2025-09-07T09:36:32.1011529Z #34 9222.1 [394/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_softcap_sm80.cu.o 2025-09-07T09:36:59.0735559Z #34 9249.1 [395/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_softcap_sm80.cu.o 2025-09-07T09:37:23.8622850Z #34 9273.9 [396/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm80.cu.o 2025-09-07T09:37:36.2372834Z #34 9286.3 [397/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_split_softcap_sm80.cu.o 2025-09-07T09:38:00.5250202Z #34 9310.6 [398/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_softcapall_sm80.cu.o 2025-09-07T09:38:18.9513701Z #34 9329.0 [399/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_softcapall_sm80.cu.o 2025-09-07T09:38:39.1611372Z #34 9349.2 [400/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_sm80.cu.o 2025-09-07T09:38:39.3899384Z #34 9349.4 [401/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_split_softcapall_sm80.cu.o 2025-09-07T09:38:55.1881804Z #34 9365.2 [402/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_softcap_sm80.cu.o 2025-09-07T09:40:01.3005341Z #34 9431.3 [403/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_sm80.cu.o 2025-09-07T09:40:31.1302908Z #34 9461.2 [404/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_softcap_sm80.cu.o 2025-09-07T09:40:52.8121923Z #34 9482.9 [405/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_softcapall_sm80.cu.o 2025-09-07T09:41:11.2436895Z #34 9501.3 [406/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_softcap_sm80.cu.o 2025-09-07T09:41:39.5367899Z #34 9529.6 [407/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_split_sm80.cu.o 2025-09-07T09:42:04.9919913Z #34 9555.0 [408/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_split_softcap_sm80.cu.o 2025-09-07T09:42:35.4613686Z #34 9585.5 [409/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_softcapall_sm80.cu.o 2025-09-07T09:42:52.2944973Z #34 9602.3 [410/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_softcapall_sm80.cu.o 2025-09-07T09:43:02.0884189Z #34 9612.1 [411/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_sm80.cu.o 2025-09-07T09:43:30.0000085Z #34 9640.0 [412/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_split_softcapall_sm80.cu.o 2025-09-07T09:44:11.9446737Z #34 9682.0 [413/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_softcap_sm80.cu.o 2025-09-07T09:44:53.4545043Z #34 9723.5 [414/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_split_sm80.cu.o 2025-09-07T09:45:11.1475688Z #34 9741.2 [415/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_split_softcap_sm80.cu.o 2025-09-07T09:46:07.4832172Z #34 9797.5 [416/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_softcap_sm80.cu.o 2025-09-07T09:46:09.5354775Z #34 9799.6 [417/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_softcapall_sm80.cu.o 2025-09-07T09:46:43.5605028Z #34 9833.6 [418/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_split_sm80.cu.o 2025-09-07T09:46:51.6124629Z #34 9841.7 [419/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_split_softcap_sm80.cu.o 2025-09-07T09:46:54.5549702Z #34 9844.6 [420/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_sm80.cu.o 2025-09-07T09:47:17.9794300Z #34 9868.0 [421/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_split_softcapall_sm80.cu.o 2025-09-07T09:47:40.5385881Z #34 9890.6 [422/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_softcap_sm80.cu.o 2025-09-07T09:47:59.2349026Z #34 9909.3 [423/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_softcapall_sm80.cu.o 2025-09-07T09:48:37.3227466Z #34 9947.4 [424/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_split_sm80.cu.o 2025-09-07T09:48:41.0317766Z #34 9951.1 [425/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_softcapall_sm80.cu.o 2025-09-07T09:48:41.2205211Z #34 9951.3 [426/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_split_softcap_sm80.cu.o 2025-09-07T09:48:58.6545648Z #34 9968.7 [427/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_split_softcapall_sm80.cu.o 2025-09-07T09:49:35.5515632Z #34 10005.6 [428/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_softcap_sm80.cu.o 2025-09-07T09:49:43.6211134Z #34 10013.7 [429/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_split_sm80.cu.o 2025-09-07T09:49:59.4100644Z #34 10029.4 [430/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_split_softcapall_sm80.cu.o 2025-09-07T09:50:06.2104085Z #34 10036.2 [431/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_split_softcap_sm80.cu.o 2025-09-07T09:50:18.1426057Z #34 10048.2 [432/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_softcapall_sm80.cu.o 2025-09-07T09:51:03.6417753Z #34 10093.7 [433/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_split_softcapall_sm80.cu.o 2025-09-07T09:51:29.8831914Z #34 10119.9 [434/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_sm80.cu.o 2025-09-07T09:51:32.2140997Z #34 10122.3 [435/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_softcap_sm80.cu.o 2025-09-07T09:51:38.1496870Z #34 10128.2 [436/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_split_sm80.cu.o 2025-09-07T09:52:13.0362772Z #34 10163.1 [437/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_split_softcap_sm80.cu.o 2025-09-07T09:52:15.7361545Z #34 10165.8 [438/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_softcapall_sm80.cu.o 2025-09-07T09:52:46.2291361Z #34 10196.3 [439/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_softcap_sm80.cu.o 2025-09-07T09:52:47.6520071Z #34 10197.7 [440/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_split_sm80.cu.o 2025-09-07T09:53:09.3525926Z #34 10219.4 [441/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_split_softcapall_sm80.cu.o 2025-09-07T09:53:31.5224696Z #34 10241.6 [442/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_softcapall_sm80.cu.o 2025-09-07T09:53:35.2409900Z #34 10245.3 [443/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_split_softcap_sm80.cu.o 2025-09-07T09:54:44.7960673Z #34 10314.8 [444/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_split_softcapall_sm80.cu.o 2025-09-07T09:55:32.5866445Z #34 10362.6 [445/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_packgqa_sm90.cu.o 2025-09-07T09:55:34.3686790Z #34 10364.4 [446/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_paged_sm90.cu.o 2025-09-07T09:55:38.7725961Z #34 10368.8 [447/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_sm90.cu.o 2025-09-07T09:56:15.0109475Z #34 10405.0 [448/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_paged_softcap_sm90.cu.o 2025-09-07T09:56:24.1885808Z #34 10414.2 [449/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_paged_split_sm90.cu.o 2025-09-07T09:56:54.5788725Z #34 10444.6 [450/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T09:57:08.0254953Z #34 10458.1 [451/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_softcap_sm90.cu.o 2025-09-07T09:57:41.0272068Z #34 10491.1 [452/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_split_sm90.cu.o 2025-09-07T09:58:02.8328347Z #34 10512.9 [453/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T09:59:38.5141519Z #34 10608.6 [454/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_split_softcap_sm90.cu.o 2025-09-07T09:59:55.8740125Z #34 10625.9 [455/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_sm90.cu.o 2025-09-07T09:59:56.3270610Z #34 10626.4 [456/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_packgqa_sm90.cu.o 2025-09-07T10:00:30.9729308Z #34 10661.0 [457/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_paged_sm90.cu.o 2025-09-07T10:00:38.4851010Z #34 10668.5 [458/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_paged_softcap_sm90.cu.o 2025-09-07T10:01:21.5357181Z #34 10711.6 [459/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_softcap_sm90.cu.o 2025-09-07T10:01:22.9129186Z #34 10713.0 [460/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_paged_split_sm90.cu.o 2025-09-07T10:01:30.3848593Z #34 10720.4 [461/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T10:02:05.8671093Z #34 10755.9 [462/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T10:02:26.3430623Z #34 10776.4 [463/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_split_sm90.cu.o 2025-09-07T10:04:39.9026211Z #34 10909.9 [464/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_paged_sm90.cu.o 2025-09-07T10:04:45.1368682Z #34 10915.2 [465/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_split_softcap_sm90.cu.o 2025-09-07T10:04:48.3213837Z #34 10918.4 [466/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_packgqa_sm90.cu.o 2025-09-07T10:04:53.6309477Z #34 10923.7 [467/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_sm90.cu.o 2025-09-07T10:05:11.1739824Z #34 10941.2 [468/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_paged_softcap_sm90.cu.o 2025-09-07T10:05:30.6167424Z #34 10960.7 [469/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_paged_split_sm90.cu.o 2025-09-07T10:05:34.0242593Z #34 10964.1 [470/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_softcap_sm90.cu.o 2025-09-07T10:06:14.9957277Z #34 11005.0 [471/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T10:06:33.2866943Z #34 11023.3 [472/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T10:07:22.5895373Z #34 11072.6 [473/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_split_sm90.cu.o 2025-09-07T10:08:01.8844839Z #34 11111.9 [474/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_sm90.cu.o 2025-09-07T10:08:12.3417999Z #34 11122.4 [475/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_packgqa_sm90.cu.o 2025-09-07T10:08:24.4669527Z #34 11134.5 [476/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_paged_sm90.cu.o 2025-09-07T10:08:31.1127728Z #34 11141.2 [477/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_paged_softcap_sm90.cu.o 2025-09-07T10:08:54.9445003Z #34 11165.0 [478/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_paged_split_sm90.cu.o 2025-09-07T10:09:00.8997983Z #34 11170.9 [479/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_softcap_sm90.cu.o 2025-09-07T10:09:13.3860500Z #34 11183.4 [480/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T10:09:46.0435682Z #34 11216.1 [481/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_split_softcap_sm90.cu.o 2025-09-07T10:09:46.9080568Z #34 11216.9 [482/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T10:10:52.6007644Z #34 11282.6 [483/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_split_sm90.cu.o 2025-09-07T10:11:35.1613535Z #34 11325.2 [484/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_split_softcap_sm90.cu.o 2025-09-07T10:12:00.6978259Z #34 11350.7 [485/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_sm90.cu.o 2025-09-07T10:12:04.0930780Z #34 11354.1 [486/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_packgqa_sm90.cu.o 2025-09-07T10:12:18.4661392Z #34 11368.5 [487/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_paged_sm90.cu.o 2025-09-07T10:12:29.0651924Z #34 11379.1 [488/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_paged_softcap_sm90.cu.o 2025-09-07T10:12:30.7702471Z #34 11380.8 [489/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_softcap_sm90.cu.o 2025-09-07T10:12:54.1345946Z #34 11404.2 [490/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_paged_split_sm90.cu.o 2025-09-07T10:13:02.9381909Z #34 11413.0 [491/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T10:13:43.1452576Z #34 11453.2 [492/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T10:14:44.0998422Z #34 11514.1 [493/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_split_sm90.cu.o 2025-09-07T10:15:28.8940101Z #34 11558.9 [494/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_packgqa_sm90.cu.o 2025-09-07T10:15:34.4498185Z #34 11564.5 [495/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_split_softcap_sm90.cu.o 2025-09-07T10:15:34.9527474Z #34 11565.0 [496/510] Linking CXX shared module cumem_allocator.abi3.so 2025-09-07T10:15:36.9204999Z #34 11567.0 [497/510] Linking CXX shared module _C.abi3.so 2025-09-07T10:15:38.0585760Z #34 11568.1 [498/510] Linking CXX shared module _moe_C.abi3.so 2025-09-07T10:15:38.6357449Z #34 11568.7 [499/510] Linking CXX shared module _flashmla_C.abi3.so 2025-09-07T10:15:39.9863383Z #34 11570.0 [500/510] Linking CXX shared module vllm-flash-attn/_vllm_fa2_C.abi3.so 2025-09-07T10:16:06.1136271Z #34 11596.2 [501/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_sm90.cu.o 2025-09-07T10:16:22.5514646Z #34 11612.6 [502/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_paged_sm90.cu.o 2025-09-07T10:16:34.7827397Z #34 11624.8 [503/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_paged_softcap_sm90.cu.o 2025-09-07T10:16:47.7003943Z #34 11637.7 [504/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_paged_split_sm90.cu.o 2025-09-07T10:16:48.7674114Z #34 11638.8 [505/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T10:17:07.0456935Z #34 11657.1 [506/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T10:17:26.0551941Z #34 11676.1 [507/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_softcap_sm90.cu.o 2025-09-07T10:17:59.0765876Z #34 11709.1 [508/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_split_sm90.cu.o 2025-09-07T10:18:48.2120981Z #34 11758.3 [509/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_split_softcap_sm90.cu.o 2025-09-07T10:18:51.3496486Z #34 11761.4 [510/510] Linking CXX shared module vllm-flash-attn/_vllm_fa3_C.abi3.so 2025-09-07T10:18:51.5231918Z #34 11761.6 -- Install configuration: "Release" 2025-09-07T10:18:51.6772904Z #34 11761.6 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/_moe_C.abi3.so 2025-09-07T10:18:51.6773959Z #34 11761.6 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/_moe_C.abi3.so" to "" 2025-09-07T10:18:51.6974668Z #34 11761.7 -- Install configuration: "Release" 2025-09-07T10:18:51.8515620Z #34 11761.7 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa2_C.abi3.so 2025-09-07T10:18:51.8516829Z #34 11761.7 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa2_C.abi3.so" to "" 2025-09-07T10:18:51.8517887Z #34 11761.7 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn 2025-09-07T10:18:51.8518736Z #34 11761.7 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/__init__.py 2025-09-07T10:18:51.8519688Z #34 11761.7 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/flash_attn_interface.py 2025-09-07T10:18:51.8520599Z #34 11761.7 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers 2025-09-07T10:18:51.8521689Z #34 11761.7 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/__init__.py 2025-09-07T10:18:51.8522629Z #34 11761.7 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/rotary.py 2025-09-07T10:18:51.8523504Z #34 11761.7 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops 2025-09-07T10:18:51.8524506Z #34 11761.7 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton 2025-09-07T10:18:51.8525428Z #34 11761.7 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/__init__.py 2025-09-07T10:18:51.8526410Z #34 11761.7 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/rotary.py 2025-09-07T10:18:51.8719413Z #34 11761.9 -- Install configuration: "Release" 2025-09-07T10:18:52.0228233Z #34 11761.9 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa3_C.abi3.so 2025-09-07T10:18:55.6263504Z #34 11765.7 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa3_C.abi3.so" to "" 2025-09-07T10:18:55.7772449Z #34 11765.7 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn 2025-09-07T10:18:55.7773386Z #34 11765.7 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/__init__.py 2025-09-07T10:18:55.7774354Z #34 11765.7 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/flash_attn_interface.py 2025-09-07T10:18:55.7775284Z #34 11765.7 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers 2025-09-07T10:18:55.7776178Z #34 11765.7 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/__init__.py 2025-09-07T10:18:55.7777100Z #34 11765.7 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/rotary.py 2025-09-07T10:18:55.7777978Z #34 11765.7 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops 2025-09-07T10:18:55.7778818Z #34 11765.7 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton 2025-09-07T10:18:55.7779735Z #34 11765.7 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/__init__.py 2025-09-07T10:18:55.7780720Z #34 11765.7 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/rotary.py 2025-09-07T10:18:55.8020365Z #34 11765.8 -- Install configuration: "Release" 2025-09-07T10:18:55.9552879Z #34 11765.8 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/_flashmla_C.abi3.so 2025-09-07T10:18:55.9553947Z #34 11765.8 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/_flashmla_C.abi3.so" to "" 2025-09-07T10:18:55.9749508Z #34 11766.0 -- Install configuration: "Release" 2025-09-07T10:18:55.9750252Z #34 11766.0 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/cumem_allocator.abi3.so 2025-09-07T10:18:56.1283375Z #34 11766.0 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/cumem_allocator.abi3.so" to "" 2025-09-07T10:18:56.1490014Z #34 11766.2 -- Install configuration: "Release" 2025-09-07T10:18:56.1490678Z #34 11766.2 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/_C.abi3.so 2025-09-07T10:18:56.2491741Z #34 11766.2 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/_C.abi3.so" to "" 2025-09-07T10:18:56.2492948Z #34 11766.2 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/__init__.py to vllm/vllm_flash_attn/__init__.py 2025-09-07T10:18:56.2493997Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/__init__.py -> vllm/vllm_flash_attn 2025-09-07T10:18:56.2495328Z #34 11766.2 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/flash_attn_interface.py to vllm/vllm_flash_attn/flash_attn_interface.py 2025-09-07T10:18:56.2496548Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/flash_attn_interface.py -> vllm/vllm_flash_attn 2025-09-07T10:18:56.2497701Z #34 11766.2 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/__init__.py to vllm/vllm_flash_attn/layers/__init__.py 2025-09-07T10:18:56.2498987Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/__init__.py -> vllm/vllm_flash_attn/layers 2025-09-07T10:18:56.2500148Z #34 11766.2 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/rotary.py to vllm/vllm_flash_attn/layers/rotary.py 2025-09-07T10:18:56.2501284Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/rotary.py -> vllm/vllm_flash_attn/layers 2025-09-07T10:18:56.2502476Z #34 11766.2 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/__init__.py to vllm/vllm_flash_attn/ops/triton/__init__.py 2025-09-07T10:18:56.2503829Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/__init__.py -> vllm/vllm_flash_attn/ops/triton 2025-09-07T10:18:56.2505017Z #34 11766.2 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/rotary.py to vllm/vllm_flash_attn/ops/triton/rotary.py 2025-09-07T10:18:56.2506218Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/rotary.py -> vllm/vllm_flash_attn/ops/triton 2025-09-07T10:18:56.2507559Z #34 11766.2 /opt/python/cp312-cp312/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:90: SetuptoolsDeprecationWarning: setup.py install is deprecated. 2025-09-07T10:18:56.2508456Z #34 11766.2 !! 2025-09-07T10:18:56.2508694Z #34 11766.2 2025-09-07T10:18:56.2508991Z #34 11766.2 ******************************************************************************** 2025-09-07T10:18:56.2509480Z #34 11766.2 Please avoid running ``setup.py`` directly. 2025-09-07T10:18:56.2509973Z #34 11766.2 Instead, use pypa/build, pypa/installer or other 2025-09-07T10:18:56.2510423Z #34 11766.2 standards-based tools. 2025-09-07T10:18:56.2510745Z #34 11766.2 2025-09-07T10:18:56.2511244Z #34 11766.2 See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details. 2025-09-07T10:18:56.2511895Z #34 11766.2 ******************************************************************************** 2025-09-07T10:18:56.2512276Z #34 11766.2 2025-09-07T10:18:56.2512508Z #34 11766.2 !! 2025-09-07T10:18:56.2512825Z #34 11766.2 self.initialize_options() 2025-09-07T10:18:56.2513388Z #34 11766.2 installing to build/bdist.linux-x86_64/wheel 2025-09-07T10:18:56.2513856Z #34 11766.2 running install 2025-09-07T10:18:56.2514161Z #34 11766.2 running install_lib 2025-09-07T10:18:56.2514566Z #34 11766.2 creating build/bdist.linux-x86_64/wheel 2025-09-07T10:18:56.2515069Z #34 11766.2 creating build/bdist.linux-x86_64/wheel/vllm 2025-09-07T10:18:56.2515842Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2516825Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/_custom_ops.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2517885Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/_ipex_ops.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2518906Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/beam_search.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2519964Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/collect_env.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2521017Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/connections.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2522065Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/env_override.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2523129Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/envs.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2524161Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/forward_context.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2525206Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/logger.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2526313Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/logits_process.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2527414Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/logprobs.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2528384Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/outputs.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2529483Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/pooling_params.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2530522Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/sampling_params.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2531939Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/scalar_type.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2532945Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/scripts.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2534000Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/sequence.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2535046Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/tasks.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2536075Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/test_utils.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2537135Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/tracing.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2538162Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/version.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2539207Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/_version.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.2540014Z #34 11766.2 creating build/bdist.linux-x86_64/wheel/vllm/adapter_commons 2025-09-07T10:18:56.2541072Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T10:18:56.2542414Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/layers.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T10:18:56.2543904Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/models.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T10:18:56.2545253Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/request.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T10:18:56.2546552Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T10:18:56.2547891Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/worker_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T10:18:56.2549060Z #34 11766.2 creating build/bdist.linux-x86_64/wheel/vllm/assets 2025-09-07T10:18:56.2550065Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/assets/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/assets 2025-09-07T10:18:56.2551272Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/assets/audio.py -> build/bdist.linux-x86_64/wheel/./vllm/assets 2025-09-07T10:18:56.2552372Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/assets/base.py -> build/bdist.linux-x86_64/wheel/./vllm/assets 2025-09-07T10:18:56.2553580Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/assets/image.py -> build/bdist.linux-x86_64/wheel/./vllm/assets 2025-09-07T10:18:56.2554792Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/assets/video.py -> build/bdist.linux-x86_64/wheel/./vllm/assets 2025-09-07T10:18:56.2555697Z #34 11766.2 creating build/bdist.linux-x86_64/wheel/vllm/attention 2025-09-07T10:18:56.2556546Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention 2025-09-07T10:18:56.2557868Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/layer.py -> build/bdist.linux-x86_64/wheel/./vllm/attention 2025-09-07T10:18:56.2559169Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/selector.py -> build/bdist.linux-x86_64/wheel/./vllm/attention 2025-09-07T10:18:56.2560149Z #34 11766.2 creating build/bdist.linux-x86_64/wheel/vllm/attention/backends 2025-09-07T10:18:56.2561329Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2562733Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/abstract.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2564299Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/differential_flash_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2565907Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/dual_chunk_flash_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2567426Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/flash_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2568846Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/flashmla.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2570330Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/placeholder_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2572113Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/rocm_aiter_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2573604Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/rocm_flash_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2575107Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/triton_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2576651Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2578161Z #34 11766.2 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/xformers.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T10:18:56.2579204Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/attention/backends/mla 2025-09-07T10:18:56.2580395Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/mla/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends/mla 2025-09-07T10:18:56.2581906Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/mla/common.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends/mla 2025-09-07T10:18:56.2583021Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/attention/layers 2025-09-07T10:18:56.2584197Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/layers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/layers 2025-09-07T10:18:56.2585618Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/layers/chunked_local_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/layers 2025-09-07T10:18:56.2587164Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/layers/encoder_only_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/layers 2025-09-07T10:18:56.2588239Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/attention/ops 2025-09-07T10:18:56.2589179Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2590672Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/chunked_prefill_paged_decode.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2592156Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/common.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2593430Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/flashmla.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2594777Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/merge_attn_states.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2596167Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/paged_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2597514Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/pallas_kv_cache_update.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2598959Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/prefix_prefill.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2600293Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/rocm_aiter_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2601661Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/rocm_aiter_paged_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2603103Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/triton_decode_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2604582Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/triton_flash_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2606077Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/triton_merge_attn_states.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2607576Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/triton_unified_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T10:18:56.2608692Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/attention/utils 2025-09-07T10:18:56.2609663Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/utils 2025-09-07T10:18:56.2611114Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/utils/fa_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/utils 2025-09-07T10:18:56.2612696Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/attention/utils/kv_sharing_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/utils 2025-09-07T10:18:56.2613742Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/benchmarks 2025-09-07T10:18:56.2614665Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks 2025-09-07T10:18:56.2615982Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/datasets.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks 2025-09-07T10:18:56.2617193Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/latency.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks 2025-09-07T10:18:56.2618499Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/serve.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks 2025-09-07T10:18:56.2619799Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/throughput.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks 2025-09-07T10:18:56.2620789Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/benchmarks/lib 2025-09-07T10:18:56.2621783Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks/lib 2025-09-07T10:18:56.2623418Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib/endpoint_request_func.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks/lib 2025-09-07T10:18:56.2624847Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib/ready_checker.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks/lib 2025-09-07T10:18:56.2626211Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks/lib 2025-09-07T10:18:56.2627108Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/compilation 2025-09-07T10:18:56.2628093Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2629406Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/activation_quant_fusion.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2630762Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/backends.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2632114Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/base_static_graph.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2633400Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/collective_fusion.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2634823Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/compiler_interface.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2636178Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/counter.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2637429Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/cuda_graph.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2638771Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/cuda_piecewise_backend.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2640205Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/decorators.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2641573Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/fix_functionalization.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2642936Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/fusion.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2644180Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/fusion_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2645454Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/fx_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2646708Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/inductor_pass.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2648044Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/monitor.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2649740Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/multi_output_match.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2651240Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/noop_elimination.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2652614Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/pass_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2653995Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/sequence_parallelism.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2655601Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/torch25_custom_graph_pass.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2657038Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/vllm_inductor_pass.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2658434Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/wrapper.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T10:18:56.2659374Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/config 2025-09-07T10:18:56.2660254Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/config/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T10:18:56.2661341Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/config/cache.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T10:18:56.2662589Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/config/compilation.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T10:18:56.2663878Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/config/parallel.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T10:18:56.2665051Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/config/scheduler.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T10:18:56.2666251Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/config/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T10:18:56.2667029Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/core 2025-09-07T10:18:56.2667856Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T10:18:56.2668998Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/block_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T10:18:56.2670148Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/evictor.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T10:18:56.2671224Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/interfaces.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T10:18:56.2672564Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/placeholder_block_space_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T10:18:56.2673861Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/scheduler.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T10:18:56.2674679Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/core/block 2025-09-07T10:18:56.2675565Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T10:18:56.2676841Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/block_table.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T10:18:56.2678062Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/common.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T10:18:56.2679348Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/cpu_gpu_block_allocator.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T10:18:56.2680668Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/interfaces.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T10:18:56.2681965Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/naive_block.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T10:18:56.2683261Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/prefix_caching_block.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T10:18:56.2684535Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T10:18:56.2685567Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/device_allocator 2025-09-07T10:18:56.2686550Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/device_allocator/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/device_allocator 2025-09-07T10:18:56.2687912Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/device_allocator/cumem.py -> build/bdist.linux-x86_64/wheel/./vllm/device_allocator 2025-09-07T10:18:56.2688882Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/distributed 2025-09-07T10:18:56.2689808Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T10:18:56.2691339Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/communication_op.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T10:18:56.2692682Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_events.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T10:18:56.2694059Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/parallel_state.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T10:18:56.2695395Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/tpu_distributed_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T10:18:56.2696794Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T10:18:56.2697770Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/distributed/device_communicators 2025-09-07T10:18:56.2699108Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2700880Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/all2all.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2702693Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/all_reduce_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2704657Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/base_device_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2706508Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/cpu_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2708284Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/cuda_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2710091Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/cuda_wrapper.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2711881Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/custom_all_reduce.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2713597Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/pynccl.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2715421Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/pynccl_wrapper.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2717212Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/quick_all_reduce.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2718965Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/ray_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2720839Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/shm_broadcast.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2722587Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/symm_mem.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2724362Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/tpu_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2726179Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/xpu_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T10:18:56.2727383Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/distributed/eplb 2025-09-07T10:18:56.2728397Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/eplb 2025-09-07T10:18:56.2729816Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb/eplb_state.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/eplb 2025-09-07T10:18:56.2731492Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb/rebalance_algo.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/eplb 2025-09-07T10:18:56.2733005Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb/rebalance_execute.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/eplb 2025-09-07T10:18:56.2734141Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer 2025-09-07T10:18:56.2735283Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer 2025-09-07T10:18:56.2736907Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_transfer_state.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer 2025-09-07T10:18:56.2738532Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer/kv_connector 2025-09-07T10:18:56.2739928Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector 2025-09-07T10:18:56.2741671Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/base.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector 2025-09-07T10:18:56.2743579Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/factory.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector 2025-09-07T10:18:56.2745293Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector 2025-09-07T10:18:56.2746638Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T10:18:56.2747940Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T10:18:56.2750110Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/base.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T10:18:56.2752047Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T10:18:56.2754100Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T10:18:56.2756132Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T10:18:56.2758134Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/shared_storage_connector.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T10:18:56.2759677Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T10:18:56.2761271Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T10:18:56.2763216Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T10:18:56.2765219Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T10:18:56.2767212Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T10:18:56.2768620Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T10:18:56.2769957Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T10:18:56.2772027Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer/base.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T10:18:56.2773989Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer/mooncake_store.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T10:18:56.2775971Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer/simple_buffer.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T10:18:56.2777276Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T10:18:56.2778564Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_pipe 2025-09-07T10:18:56.2780265Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe/base.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_pipe 2025-09-07T10:18:56.2782003Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_pipe 2025-09-07T10:18:56.2783806Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe/pynccl_pipe.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_pipe 2025-09-07T10:18:56.2785463Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/README.md -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer 2025-09-07T10:18:56.2787031Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/disagg_prefill_workflow.jpg -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer 2025-09-07T10:18:56.2788207Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/engine 2025-09-07T10:18:56.2789064Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T10:18:56.2790186Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/arg_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T10:18:56.2791427Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/async_llm_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T10:18:56.2792642Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/async_timeout.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T10:18:56.2793806Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/llm_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T10:18:56.2794961Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/metrics.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T10:18:56.2796122Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/metrics_types.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T10:18:56.2797308Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/protocol.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T10:18:56.2798249Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/engine/multiprocessing 2025-09-07T10:18:56.2799355Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/multiprocessing 2025-09-07T10:18:56.2800911Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing/client.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/multiprocessing 2025-09-07T10:18:56.2802406Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing/engine.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/multiprocessing 2025-09-07T10:18:56.2803518Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/engine/output_processor 2025-09-07T10:18:56.2804728Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/output_processor 2025-09-07T10:18:56.2806241Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor/interfaces.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/output_processor 2025-09-07T10:18:56.2807771Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor/single_step.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/output_processor 2025-09-07T10:18:56.2809353Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor/stop_checker.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/output_processor 2025-09-07T10:18:56.2810903Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor/util.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/output_processor 2025-09-07T10:18:56.2812195Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/entrypoints 2025-09-07T10:18:56.2813103Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2814443Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/api_server.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2815759Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/chat_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2817107Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/constants.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2818490Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/context.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2819808Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/harmony_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2821244Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/launcher.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2822529Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/llm.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2823884Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/logger.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2825137Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/renderer.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2826464Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/score_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2827694Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/ssl.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2828960Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/tool.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2830217Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/tool_server.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2831498Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T10:18:56.2832402Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/entrypoints/cli 2025-09-07T10:18:56.2833429Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T10:18:56.2834769Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/collect_env.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T10:18:56.2836105Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/main.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T10:18:56.2837523Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/openai.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T10:18:56.2838913Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/run_batch.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T10:18:56.2840235Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/serve.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T10:18:56.2841532Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/types.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T10:18:56.2842609Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/entrypoints/cli/benchmark 2025-09-07T10:18:56.2843701Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T10:18:56.2845291Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/base.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T10:18:56.2846899Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/latency.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T10:18:56.2848528Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/main.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T10:18:56.2850430Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/serve.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T10:18:56.2852208Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/throughput.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T10:18:56.2853514Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/entrypoints/openai 2025-09-07T10:18:56.2854608Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.2856080Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/api_server.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.2857546Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/cli_args.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.2859109Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/logits_processors.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.2860665Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/protocol.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.2862132Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/run_batch.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.2863739Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_chat.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.2865290Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_classification.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.2866849Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_completion.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.2868430Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_embedding.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.2869967Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.3491344Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_models.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.3493158Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_pooling.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.3494771Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_responses.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.3496329Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_score.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.3497953Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_tokenization.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.3499630Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_transcription.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.3501264Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/speech_to_text.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T10:18:56.3502465Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3503909Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3505687Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/abstract_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3507646Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/deepseekv31_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3509602Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/deepseekv3_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3511437Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/glm4_moe_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3513269Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/granite_20b_fc_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3515118Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/granite_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3516942Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3518814Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3520770Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/internlm2_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3522594Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/jamba_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3524389Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/kimi_k2_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3526344Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/llama4_pythonic_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3528194Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/llama_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3529998Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/minimax_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3532110Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3533994Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3535875Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3537802Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3539771Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/qwen3coder_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3541652Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/seed_oss_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3543741Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/step3_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3545464Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3547199Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/xlam_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T10:18:56.3548366Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/executor 2025-09-07T10:18:56.3549536Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/executor/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T10:18:56.3550778Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/executor/executor_base.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T10:18:56.3552180Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/executor/mp_distributed_executor.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T10:18:56.3553507Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/executor/msgspec_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T10:18:56.3554897Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/executor/multiproc_worker_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T10:18:56.3556343Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/executor/ray_distributed_executor.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T10:18:56.3557645Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/executor/ray_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T10:18:56.3558976Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/executor/uniproc_executor.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T10:18:56.3559929Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/inputs 2025-09-07T10:18:56.3560832Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/inputs/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/inputs 2025-09-07T10:18:56.3562141Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/inputs/data.py -> build/bdist.linux-x86_64/wheel/./vllm/inputs 2025-09-07T10:18:56.3563247Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/inputs/parse.py -> build/bdist.linux-x86_64/wheel/./vllm/inputs 2025-09-07T10:18:56.3564419Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/inputs/preprocess.py -> build/bdist.linux-x86_64/wheel/./vllm/inputs 2025-09-07T10:18:56.3565639Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/inputs/registry.py -> build/bdist.linux-x86_64/wheel/./vllm/inputs 2025-09-07T10:18:56.3566545Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/logging_utils 2025-09-07T10:18:56.3567507Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/logging_utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/logging_utils 2025-09-07T10:18:56.3568800Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/logging_utils/dump_input.py -> build/bdist.linux-x86_64/wheel/./vllm/logging_utils 2025-09-07T10:18:56.3570091Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/logging_utils/formatter.py -> build/bdist.linux-x86_64/wheel/./vllm/logging_utils 2025-09-07T10:18:56.3571095Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/lora 2025-09-07T10:18:56.3572164Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T10:18:56.3573366Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/fully_sharded_layers.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T10:18:56.3574610Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/layers.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T10:18:56.3575845Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/lora.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T10:18:56.3576968Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/models.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T10:18:56.3578166Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/peft_helper.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T10:18:56.3579317Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/request.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T10:18:56.3580485Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/resolver.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T10:18:56.3581644Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T10:18:56.3582793Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/worker_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T10:18:56.3583788Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/lora/ops 2025-09-07T10:18:56.3584656Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops 2025-09-07T10:18:56.3585567Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/lora/ops/ipex_ops 2025-09-07T10:18:56.3586630Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/ipex_ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/ipex_ops 2025-09-07T10:18:56.3587965Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/ipex_ops/lora_ops.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/ipex_ops 2025-09-07T10:18:56.3588975Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/lora/ops/torch_ops 2025-09-07T10:18:56.3590056Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/torch_ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/torch_ops 2025-09-07T10:18:56.3591405Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/torch_ops/lora_ops.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/torch_ops 2025-09-07T10:18:56.3592515Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/lora/ops/triton_ops 2025-09-07T10:18:56.3593563Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T10:18:56.3595014Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/kernel_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T10:18:56.3596503Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/lora_expand_op.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T10:18:56.3597985Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/lora_kernel_metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T10:18:56.3599519Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/lora_shrink_op.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T10:18:56.3600984Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T10:18:56.3602005Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/lora/ops/xla_ops 2025-09-07T10:18:56.3603005Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/xla_ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/xla_ops 2025-09-07T10:18:56.3604313Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/xla_ops/lora_ops.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/xla_ops 2025-09-07T10:18:56.3605325Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/lora/punica_wrapper 2025-09-07T10:18:56.3606426Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T10:18:56.3607886Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_base.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T10:18:56.3609341Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_cpu.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T10:18:56.3610892Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_gpu.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T10:18:56.3612602Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_selector.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T10:18:56.3614145Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_tpu.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T10:18:56.3615621Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_xpu.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T10:18:56.3617107Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T10:18:56.3618153Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/model_executor 2025-09-07T10:18:56.3619135Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor 2025-09-07T10:18:56.3620494Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/custom_op.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor 2025-09-07T10:18:56.3621854Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/parameter.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor 2025-09-07T10:18:56.3623372Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/sampling_metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor 2025-09-07T10:18:56.3624791Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor 2025-09-07T10:18:56.3625791Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers 2025-09-07T10:18:56.3626849Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3628345Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/activation.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3629914Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/attention_layer_base.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3631472Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/layernorm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3632991Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/lightning_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3634457Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/linear.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3636009Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/logits_processor.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3637519Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mla.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3638957Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/pooler.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3640475Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/resampler.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3641971Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/sampler.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3643436Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3644999Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/vocab_parallel_embedding.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T10:18:56.3646197Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3652524Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3654331Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3656206Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/batched_triton_or_deep_gemm_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3658032Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/config.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3659768Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/cpu_fused_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3661564Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/cutlass_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3663534Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/deep_gemm_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3665267Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/deep_gemm_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3667048Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3668891Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/deepep_ll_prepare_finalize.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3670735Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3672664Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3674566Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/fused_batched_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3676321Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/fused_marlin_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3678011Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/fused_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3679826Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3681562Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/layer.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3683245Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/modular_kernel.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3684996Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/moe_align_block_size.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3686711Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/moe_pallas.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3688531Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3690323Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/moe_torch_iterative.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3692434Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/pplx_prepare_finalize.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3694279Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/prepare_finalize.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3696110Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3697967Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/routing_simulator.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3699810Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/topk_weight_and_reduce.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3701655Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/triton_deep_gemm_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3703526Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/trtllm_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3705201Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T10:18:56.3706453Z #34 11766.3 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3708112Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3710377Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3712633Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3714947Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3717165Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3719362Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3721681Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3723862Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3726112Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3728352Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3730592Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3733023Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3735289Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3737439Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20-3e.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3739540Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3741634Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3744029Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=352,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3746401Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3748978Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3751551Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3753863Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3755969Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3758331Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3760628Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3762853Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3765045Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3767349Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3769663Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3772354Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3774799Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3777223Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3779539Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3781802Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3784243Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3786253Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=96,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3788372Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3790505Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3792687Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3794719Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_H100.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3796863Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3798988Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3801204Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3803460Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3805655Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3807858Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3810053Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3812558Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3814739Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3817054Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3819482Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3821847Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=3200,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3824242Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3826545Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3828766Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=6400,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3831014Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3833264Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3835557Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3837793Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=800,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3840030Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_A800-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3842069Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_H20-3e.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3844161Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=320,device_name=NVIDIA_H20-3e.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3846396Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3848934Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3851628Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3854139Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3856570Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3858952Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3861427Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3863876Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325X,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3866278Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3868634Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3871124Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3873503Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3875877Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3878236Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3880628Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3883026Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3885370Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3887834Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3890283Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3892929Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3895407Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3897846Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3900234Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3902607Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3905143Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3907558Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3909824Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=64,device_name=NVIDIA_A800-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3912082Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3914460Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3916815Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3919060Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=60,N=1408,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3921217Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=60,N=176,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3923325Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=60,N=352,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3925466Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=60,N=704,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3927576Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=62,N=256,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3929784Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=62,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3932144Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3934336Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A800-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3936650Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3938950Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3941188Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3943490Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3945591Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3947780Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3950326Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3952554Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3954708Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3956954Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3959150Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3961506Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3963632Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3965865Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3967936Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3969995Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3972319Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3974603Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A800-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3976847Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3979210Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3981496Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3983762Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3985945Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3988031Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3990159Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3992199Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=896,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3994311Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=72,N=384,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3996415Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=72,N=768,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.3998618Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4000815Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4003041Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4005203Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4007483Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4009688Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4012104Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4014340Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4016628Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4018884Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4021214Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4023621Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4025840Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4028006Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4030184Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4032304Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4034473Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4036627Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4038816Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4040894Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4043115Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4045295Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4047516Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4049977Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4052296Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4054674Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4056911Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4059250Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4061622Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4063902Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4066035Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4068272Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4070482Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4072696Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4074814Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4076988Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4079224Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4081466Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4083693Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4085820Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4087911Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4089978Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_L40S.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4092436Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4094745Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4097048Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4099272Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4101508Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4103946Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4106108Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4108275Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4110366Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4112584Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4114750Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4116981Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4119208Z #34 11766.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4121326Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4123588Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4125761Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4127896Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4130066Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4132480Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4134722Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4137003Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4139243Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4141479Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4143874Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4146132Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4148379Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4150653Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/README -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T10:18:56.4151958Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/mamba 2025-09-07T10:18:56.4153214Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T10:18:56.4154862Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/abstract.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T10:18:56.4156486Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/linear_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T10:18:56.4158272Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/mamba2_metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T10:18:56.4159988Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/mamba_mixer.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T10:18:56.4161811Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/mamba_mixer2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T10:18:56.4163505Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/mamba_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T10:18:56.4165118Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/short_conv.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T10:18:56.4166301Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/mamba/ops 2025-09-07T10:18:56.4167552Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T10:18:56.4169250Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/causal_conv1d.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T10:18:56.4171130Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/layernorm_gated.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T10:18:56.4173028Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/mamba_ssm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T10:18:56.4174764Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/ssd_bmm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T10:18:56.4176520Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/ssd_chunk_scan.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T10:18:56.4178300Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/ssd_chunk_state.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T10:18:56.4180092Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/ssd_combined.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T10:18:56.4181947Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/ssd_state_passing.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T10:18:56.4183303Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4184696Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4186450Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/auto_round.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4188185Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/awq.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4189946Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/awq_marlin.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4191721Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/awq_triton.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4193537Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/base_config.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4195300Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/bitblas.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4197098Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/bitsandbytes.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4198940Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/deepgemm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4200798Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/deepspeedfp.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4202616Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/experts_int8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4204343Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/fbgemm_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4206140Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4207786Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/gguf.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4209565Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/gptq.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4211578Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/gptq_bitblas.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4213428Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/gptq_marlin.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4215275Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/gptq_marlin_24.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4217158Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/hqq_marlin.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4218993Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/inc.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4220823Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/input_quant_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4222685Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/ipex_quant.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4224581Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kv_cache.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4226379Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/modelopt.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4228175Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/moe_wna16.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4229955Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/mxfp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4231643Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/petit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4233727Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/ptpc_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4235452Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/rtn.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4237184Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/schema.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4238892Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/torchao.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4240683Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/tpu_int8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T10:18:56.4242113Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T10:18:56.4243723Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T10:18:56.4245964Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T10:18:56.4248246Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T10:18:56.4250924Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/triton_scaled_mm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T10:18:56.4253166Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T10:18:56.4254808Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4256544Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4259098Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_24.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4261664Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_scheme.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4264408Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_24.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4266987Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_nvfp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4269671Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a4_nvfp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4272206Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4274777Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_int.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4277359Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4279930Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4282551Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4285093Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T10:18:56.4286933Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T10:18:56.4288744Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/linear.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T10:18:56.4291413Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/module.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T10:18:56.4293863Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T10:18:56.4311964Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes 2025-09-07T10:18:56.4314142Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes/linear_qutlass_nvfp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes 2025-09-07T10:18:56.4315991Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/kernels 2025-09-07T10:18:56.4317519Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels 2025-09-07T10:18:56.4319032Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T10:18:56.4320786Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/MPLinearKernel.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T10:18:56.4323185Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T10:18:56.4325482Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/allspark.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T10:18:56.4327757Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/bitblas.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T10:18:56.4330059Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/conch.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T10:18:56.4332701Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/cutlass.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T10:18:56.4335054Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/dynamic_4bit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T10:18:56.4337460Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/exllama.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T10:18:56.4339833Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/machete.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T10:18:56.4342269Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/marlin.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T10:18:56.4344074Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T10:18:56.4345762Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/ScaledMMLinearKernel.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T10:18:56.4348019Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T10:18:56.4350503Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/aiter.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T10:18:56.4352646Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T10:18:56.4354943Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/cutlass.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T10:18:56.4357199Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T10:18:56.4359458Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/xla.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T10:18:56.4361129Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/quark 2025-09-07T10:18:56.4362484Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark 2025-09-07T10:18:56.4364398Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/quark.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark 2025-09-07T10:18:56.4366279Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/quark_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark 2025-09-07T10:18:56.4368285Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark 2025-09-07T10:18:56.4369671Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T10:18:56.4371440Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T10:18:56.4373641Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes/quark_scheme.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T10:18:56.4375812Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes/quark_w4a4_mxfp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T10:18:56.4378105Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T10:18:56.4380349Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_int8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T10:18:56.4381917Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4383365Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4385334Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/allspark_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4387351Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/bitblas_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4389321Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4391352Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/flashinfer_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4393339Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/fp8_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4395282Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/gptq_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4397188Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/int8_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4399096Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/layer_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4401077Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/machete_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4403026Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/marlin_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4404953Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/marlin_utils_fp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4406972Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4408943Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/marlin_utils_test.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4411044Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/marlin_utils_test_24.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4413276Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/mxfp4_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4415219Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/mxfp8_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4417298Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4419309Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/nvfp4_moe_support.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4421380Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/petit_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4423462Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/quant_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4425432Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/w8a8_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T10:18:56.4426840Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4428709Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4431497Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4434173Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4436846Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4439579Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4442356Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4445078Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4447772Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4450883Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4453613Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4456338Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4459104Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4461938Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4464791Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4467556Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4470255Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4472902Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4475568Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4478163Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4480790Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4483432Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4486150Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4488772Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4491792Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4494618Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4497491Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4500213Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4502893Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4505633Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4508424Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4511107Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4513750Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4516433Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4519134Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4521861Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4524587Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4527286Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4529947Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4532890Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4535594Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4538382Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4541116Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4543901Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4546651Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4549722Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4552461Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4555295Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4558057Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4560855Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4563596Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4566146Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4568809Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4571664Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4574554Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4577295Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4579996Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4582796Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4585673Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4588386Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4591112Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4593764Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4596367Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4599002Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4601731Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4604453Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4607100Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4609749Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4612619Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4615312Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4618097Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4620905Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4623771Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4626415Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4629160Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4631816Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4634357Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4637028Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4639666Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4642431Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4645136Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4647763Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4650939Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4653667Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4656489Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4659154Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4661835Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4664683Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4667244Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4669904Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4672564Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4675196Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4677940Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4680709Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4683420Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4686047Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4688704Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4691487Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4694038Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4696572Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4699159Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4701661Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4704240Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4706708Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4709172Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4711691Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4714199Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4716748Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4719210Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4721680Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4724187Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4726605Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4729027Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4731792Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4734378Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4736949Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4739492Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4741997Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4744599Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4747007Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4749862Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4752441Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4755012Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4757670Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4760273Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4762892Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4765393Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4767847Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4770265Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4772980Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4775468Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4778008Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4780551Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4783413Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4785851Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4788300Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4790792Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4793223Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4795610Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4797987Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4800347Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4802729Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4805152Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4807595Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4810079Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4812838Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4815476Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4818021Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4820510Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4823177Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4825687Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4828107Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4830517Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4833005Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4835455Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4837884Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4840257Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4842637Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4845040Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4847509Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4850484Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4853197Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4855574Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/README.md -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4857878Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4860466Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4863217Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4865661Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4868027Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4870402Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4872771Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4875172Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4877583Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4880065Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4882515Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4884967Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4887432Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4889889Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4892595Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4895136Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4897688Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4900214Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4902734Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4905251Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4907672Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4910289Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4912825Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4915297Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4917762Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4920224Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4922680Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4925200Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4927595Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4930056Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4932807Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4935361Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4937917Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4940489Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4943005Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4945489Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4947902Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4950701Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4953283Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4955905Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4958411Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4960903Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4963536Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4965995Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4968497Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4971069Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4973833Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4976466Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4979048Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T10:18:56.4980801Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.4982081Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.4983876Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/base.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.4985537Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/common.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.4987316Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.4989090Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/dual_chunk_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.4990877Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/dynamic_ntk_alpha_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.4992697Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/dynamic_ntk_scaling_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.4994470Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/ernie45_vl_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.4996273Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/linear_scaling_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.4998020Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/llama3_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.4999754Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/llama4_vision_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.5001481Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/mrope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.5003196Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/ntk_scaling_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.5004984Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/phi3_long_rope_scaled_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.5006821Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/yarn_scaling_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T10:18:56.5008114Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/shared_fused_moe 2025-09-07T10:18:56.5009321Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/shared_fused_moe/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/shared_fused_moe 2025-09-07T10:18:56.5011103Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/shared_fused_moe/shared_fused_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/shared_fused_moe 2025-09-07T10:18:56.5012529Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/model_loader 2025-09-07T10:18:56.5013624Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5015143Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/base_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5016739Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/bitsandbytes_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5018336Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/default_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5019938Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/dummy_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5021473Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/gguf_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5023049Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/runai_streamer_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5024717Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/sharded_state_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5026192Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/tensorizer.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5027697Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/tensorizer_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5029139Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/tpu.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5030512Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5031933Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/weight_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T10:18:56.5032972Z #34 11766.4 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/models 2025-09-07T10:18:56.5033939Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5035250Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/adapters.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5036563Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/aimv2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5037892Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/apertus.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5039195Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/arcee.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5040478Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/arctic.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5041812Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/aria.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5043118Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/aya_vision.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5044437Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/baichuan.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5045778Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bailing_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5047096Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bamba.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5048405Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bart.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5050274Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bert.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5051748Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bert_with_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5053157Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/blip.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5054526Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/blip2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5055882Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bloom.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5057353Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/chameleon.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5058779Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/chatglm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5060161Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/clip.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5061587Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/cohere2_vision.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5063136Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/commandr.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5064459Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/config.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5065840Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/constant_size_cache.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5067203Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/dbrx.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5068547Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/deepseek.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5069914Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/deepseek_eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5071290Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/deepseek_mtp.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5072700Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/deepseek_v2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5074063Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/deepseek_vl2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5075388Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/donut.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5076692Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/dots1.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5078006Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ernie45.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5079380Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ernie45_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5080728Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ernie45_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5082084Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ernie45_vl_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5083424Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ernie_mtp.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5084750Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/exaone.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5086073Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/exaone4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5087461Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/fairseq2_llama.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5088819Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/falcon.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5090137Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/falcon_h1.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5091728Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/florence2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5093198Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/fuyu.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5094565Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5095954Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5097345Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5098773Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma3_mm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5100180Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma3n.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5101595Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma3n_mm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5103008Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5104399Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5105685Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm4_1v.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5106988Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm4_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5108315Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm4_moe_mtp.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5109660Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm4v.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5110949Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gpt2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5112266Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gpt_bigcode.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5113576Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gpt_j.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5114874Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gpt_neox.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5116182Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gpt_oss.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5117703Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/granite.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5119105Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/granite_speech.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5120508Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/granitemoe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5121954Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/granitemoehybrid.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5123436Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/granitemoeshared.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5124842Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gritlm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5126179Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/grok1.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5127501Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/h2ovl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5128948Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/hunyuan_v1.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5130376Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/hyperclovax_vision.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5132122Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/idefics2_vision_model.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5133634Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/idefics3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5135059Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/interfaces.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5136522Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/interfaces_base.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5137964Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/intern_vit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5139385Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/internlm2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5140844Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/internlm2_ve.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5142276Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/interns1.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5143885Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/interns1_vit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5145221Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/internvl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5146522Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/jais.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5147809Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/jamba.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5149456Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/jina_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5150839Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/keye.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5152229Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/keye_vl1_5.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5153617Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/kimi_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5154995Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/lfm2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5156358Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llama.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5157737Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llama4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5159151Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llama4_eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5160632Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llama_eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5162249Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llama_eagle3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5163860Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llava.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5165216Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llava_next.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5166578Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llava_next_video.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5167971Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llava_onevision.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5169312Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mamba.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5170620Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mamba2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5172272Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mamba_cache.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5173689Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/medusa.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5175119Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/midashenglm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5176516Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mimo.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5177903Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mimo_mtp.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5179318Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minicpm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5180761Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minicpm3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5182206Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minicpm_eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5183846Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minicpmo.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5185170Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minicpmv.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5186527Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minimax_cache.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5187895Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minimax_text_01.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5189266Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minimax_vl_01.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5190611Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mistral3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5191976Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mixtral.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5193319Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mixtral_quant.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5194696Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mllama.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5196000Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mllama4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5197360Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mlp_speculator.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5198736Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/modernbert.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5200099Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/module_mapping.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5201447Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/molmo.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5202773Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/moonvit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5204081Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mpt.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5205390Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/nemotron.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5206717Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/nemotron_h.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5208072Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/nemotron_nas.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5209436Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/nemotron_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5210855Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/nvlm_d.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5212385Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/olmo.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5213764Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/olmo2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5215136Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/olmoe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5216515Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/opt.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5218039Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/orion.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5219426Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ovis.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5220806Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ovis2_5.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5222249Z #34 11766.4 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/paligemma.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5223787Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/persimmon.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5225174Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5226489Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5227812Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi3v.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5229288Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi4_multimodal.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5230658Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi4flash.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5231987Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi4mm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5233350Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi4mm_audio.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5234713Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi4mm_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5236050Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phimoe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5237360Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/pixtral.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5238676Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/plamo2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5239975Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5241281Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5242635Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_5_omni_thinker.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5244013Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_5_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5245335Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_audio.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5246680Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5247995Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_rm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5249801Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5251275Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5252725Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen3_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5254129Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5255587Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/registry.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5256992Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/roberta.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5258376Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/rvl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5259762Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/seed_oss.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5261145Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/siglip.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5262693Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/siglip2navit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5264142Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/skyworkr1v.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5265523Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/smolvlm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5266873Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/solar.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5268223Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/stablelm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5269614Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/starcoder2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5271013Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/step3_text.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5272413Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/step3_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5273750Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/swin.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5275099Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/tarsier.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5276459Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/telechat2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5277831Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/teleflm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5279196Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/terratorch.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5280612Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/transformers.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5282021Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ultravox.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5283400Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5284750Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/vision.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5286108Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/voxtral.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5287484Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/whisper.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5288836Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/zamba2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T10:18:56.5289838Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/warmup 2025-09-07T10:18:56.5290897Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/warmup 2025-09-07T10:18:56.5292499Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup/deep_gemm_warmup.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/warmup 2025-09-07T10:18:56.5294006Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup/kernel_warmup.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/warmup 2025-09-07T10:18:56.5295031Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/multimodal 2025-09-07T10:18:56.5295903Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5297066Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/audio.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5298229Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/base.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5299393Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/cache.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5300559Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/hasher.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5301742Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/image.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5302957Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/inputs.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5304181Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/parse.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5305323Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/processing.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5306478Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/profiling.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5307626Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/registry.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5308746Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5309846Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/video.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T10:18:56.5310660Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/platforms 2025-09-07T10:18:56.5311461Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T10:18:56.5312563Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/cpu.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T10:18:56.5313631Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/cuda.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T10:18:56.5314726Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/interface.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T10:18:56.5315858Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/rocm.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T10:18:56.5317079Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/tpu.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T10:18:56.5318179Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/xpu.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T10:18:56.5318985Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/plugins 2025-09-07T10:18:56.5319773Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/plugins 2025-09-07T10:18:56.5320649Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/plugins/io_processors 2025-09-07T10:18:56.5321655Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/io_processors/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/plugins/io_processors 2025-09-07T10:18:56.5323081Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/io_processors/interface.py -> build/bdist.linux-x86_64/wheel/./vllm/plugins/io_processors 2025-09-07T10:18:56.5324121Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/plugins/lora_resolvers 2025-09-07T10:18:56.5325139Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/plugins/lora_resolvers 2025-09-07T10:18:56.5326595Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers/filesystem_resolver.py -> build/bdist.linux-x86_64/wheel/./vllm/plugins/lora_resolvers 2025-09-07T10:18:56.5328056Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers/README.md -> build/bdist.linux-x86_64/wheel/./vllm/plugins/lora_resolvers 2025-09-07T10:18:56.5329013Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/profiler 2025-09-07T10:18:56.5329933Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/profiler/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/profiler 2025-09-07T10:18:56.5331328Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/profiler/layerwise_profile.py -> build/bdist.linux-x86_64/wheel/./vllm/profiler 2025-09-07T10:18:56.5332552Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/profiler/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/profiler 2025-09-07T10:18:56.5333388Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/ray 2025-09-07T10:18:56.5334150Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/ray/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/ray 2025-09-07T10:18:56.5335210Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/ray/lazy_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/ray 2025-09-07T10:18:56.5336249Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/ray/ray_env.py -> build/bdist.linux-x86_64/wheel/./vllm/ray 2025-09-07T10:18:56.5337054Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/reasoning 2025-09-07T10:18:56.5337917Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T10:18:56.5339141Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/abs_reasoning_parsers.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T10:18:56.5340496Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/deepseek_r1_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T10:18:56.5341892Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/glm4_moe_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T10:18:56.5343366Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/gptoss_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T10:18:56.5344762Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/granite_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T10:18:56.5346091Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/hunyuan_a13b_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T10:18:56.5347393Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/mistral_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T10:18:56.5348652Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/qwen3_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T10:18:56.5350441Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/step3_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T10:18:56.5351399Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/third_party 2025-09-07T10:18:56.5352264Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/third_party/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/third_party 2025-09-07T10:18:56.5353518Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/third_party/pynvml.py -> build/bdist.linux-x86_64/wheel/./vllm/third_party 2025-09-07T10:18:56.5354445Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils 2025-09-07T10:18:56.5355436Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T10:18:56.5356787Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/config.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T10:18:56.5358153Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/detokenizer.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T10:18:56.5359602Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/detokenizer_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T10:18:56.5361164Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/dynamic_module.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T10:18:56.5362648Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processor.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T10:18:56.5363942Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/s3_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T10:18:56.5365233Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizer.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T10:18:56.5366545Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizer_base.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T10:18:56.5367895Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizer_group.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T10:18:56.5369206Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T10:18:56.5370193Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils/chat_templates 2025-09-07T10:18:56.5371600Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T10:18:56.5373301Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/registry.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T10:18:56.5375032Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_basic.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T10:18:56.5376852Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_blip2.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T10:18:56.5378619Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_chatml.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T10:18:56.5380433Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_deepseek_vl2.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T10:18:56.5382237Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_fuyu.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T10:18:56.5384203Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_minicpmv45.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T10:18:56.5385449Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils/configs 2025-09-07T10:18:56.5386503Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5387911Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/arctic.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5389345Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/chatglm.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5390800Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/deepseek_vl2.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5392229Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5393678Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/falcon.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5395086Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/jais.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5396481Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/kimi_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5397894Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/medusa.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5399338Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/midashenglm.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5400804Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/mistral.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5402273Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/mlp_speculator.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5403757Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/moonvit.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5405201Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/nemotron.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5406653Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/nemotron_h.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5408144Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/nemotron_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5409573Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/ovis.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5411058Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/step3_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5412733Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/ultravox.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T10:18:56.5413927Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils/configs/speculators 2025-09-07T10:18:56.5415287Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs/speculators 2025-09-07T10:18:56.5417119Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators/algos.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs/speculators 2025-09-07T10:18:56.5418889Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators/base.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs/speculators 2025-09-07T10:18:56.5420129Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils/processors 2025-09-07T10:18:56.5421297Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/processors 2025-09-07T10:18:56.5422900Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors/deepseek_vl2.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/processors 2025-09-07T10:18:56.5424607Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors/ovis.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/processors 2025-09-07T10:18:56.5426124Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors/ovis2_5.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/processors 2025-09-07T10:18:56.5427249Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils/tokenizers 2025-09-07T10:18:56.5428473Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/tokenizers 2025-09-07T10:18:56.5429965Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizers/mistral.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/tokenizers 2025-09-07T10:18:56.5431001Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/triton_utils 2025-09-07T10:18:56.5431836Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/triton_utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/triton_utils 2025-09-07T10:18:56.5432981Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/triton_utils/importing.py -> build/bdist.linux-x86_64/wheel/./vllm/triton_utils 2025-09-07T10:18:56.5433803Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/usage 2025-09-07T10:18:56.5434582Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/usage/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/usage 2025-09-07T10:18:56.5435606Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/usage/usage_lib.py -> build/bdist.linux-x86_64/wheel/./vllm/usage 2025-09-07T10:18:56.5436361Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/utils 2025-09-07T10:18:56.5437107Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/utils 2025-09-07T10:18:56.5438146Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/utils/deep_gemm.py -> build/bdist.linux-x86_64/wheel/./vllm/utils 2025-09-07T10:18:56.5439195Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/utils/flashinfer.py -> build/bdist.linux-x86_64/wheel/./vllm/utils 2025-09-07T10:18:56.5440249Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/utils/jsontree.py -> build/bdist.linux-x86_64/wheel/./vllm/utils 2025-09-07T10:18:56.5441294Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/utils/tensor_schema.py -> build/bdist.linux-x86_64/wheel/./vllm/utils 2025-09-07T10:18:56.5442071Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1 2025-09-07T10:18:56.5442764Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T10:18:56.5443787Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/cudagraph_dispatcher.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T10:18:56.5444901Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/kv_cache_interface.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T10:18:56.5445904Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/outputs.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T10:18:56.5446880Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/request.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T10:18:56.5447863Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/serial_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T10:18:56.5448995Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T10:18:56.5449964Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/attention 2025-09-07T10:18:56.5450914Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention 2025-09-07T10:18:56.5451869Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/attention/backends 2025-09-07T10:18:56.5452978Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5454424Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/cpu_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5455891Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/flash_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5457367Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/flashinfer.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5458885Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/flex_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5460402Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/linear_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5461879Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mamba1_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5463703Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mamba2_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5465140Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mamba_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5466516Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/pallas.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5467944Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/rocm_aiter_fa.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5469356Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/short_conv_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5470768Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/tree_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5472155Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/triton_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5473519Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5474883Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/xformers.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T10:18:56.5475952Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/attention/backends/mla 2025-09-07T10:18:56.5476990Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T10:18:56.5478424Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/common.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T10:18:56.5479892Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/cutlass_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T10:18:56.5481380Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/flashattn_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T10:18:56.5482877Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/flashmla.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T10:18:56.5484380Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/rocm_aiter_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T10:18:56.5485861Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/triton_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T10:18:56.5486847Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/core 2025-09-07T10:18:56.6197963Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T10:18:56.6201017Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/block_pool.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T10:18:56.6204147Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/encoder_cache_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T10:18:56.6206355Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/kv_cache_coordinator.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T10:18:56.6207526Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/kv_cache_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T10:18:56.6208669Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/kv_cache_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T10:18:56.6210012Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/single_type_kv_cache_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T10:18:56.6211080Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/core/sched 2025-09-07T10:18:56.6212161Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T10:18:56.6213505Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/async_scheduler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T10:18:56.6214813Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/interface.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T10:18:56.6216054Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/output.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T10:18:56.6217330Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/request_queue.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T10:18:56.6218625Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/scheduler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T10:18:56.6219857Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T10:18:56.6220807Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/engine 2025-09-07T10:18:56.6221663Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6222920Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/async_llm.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6224179Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/coordinator.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6225284Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/core.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6226383Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/core_client.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6227519Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/detokenizer.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6228706Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/exceptions.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6229828Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/llm_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6230946Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/logprobs.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6232090Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/output_processor.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6233294Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/parallel_sampling.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6234451Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/processor.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6235563Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T10:18:56.6236377Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/executor 2025-09-07T10:18:56.6237190Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/executor/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/executor 2025-09-07T10:18:56.6238352Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/executor/abstract.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/executor 2025-09-07T10:18:56.6239550Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/executor/multiproc_executor.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/executor 2025-09-07T10:18:56.6240830Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/executor/ray_distributed_executor.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/executor 2025-09-07T10:18:56.6241815Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/metrics 2025-09-07T10:18:56.6242621Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T10:18:56.6243733Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/loggers.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T10:18:56.6244878Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/prometheus.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T10:18:56.6246023Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/ray_wrappers.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T10:18:56.6247156Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/reader.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T10:18:56.6248244Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/stats.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T10:18:56.6249420Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/pool 2025-09-07T10:18:56.6250414Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/pool/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/pool 2025-09-07T10:18:56.6251592Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/pool/metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/pool 2025-09-07T10:18:56.6252449Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/sample 2025-09-07T10:18:56.6253296Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample 2025-09-07T10:18:56.6254460Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample 2025-09-07T10:18:56.6255699Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/rejection_sampler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample 2025-09-07T10:18:56.6256926Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/sampler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample 2025-09-07T10:18:56.6257934Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/sample/logits_processor 2025-09-07T10:18:56.6259043Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/logits_processor 2025-09-07T10:18:56.6260555Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor/builtin.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/logits_processor 2025-09-07T10:18:56.6262090Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor/interface.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/logits_processor 2025-09-07T10:18:56.6263669Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor/state.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/logits_processor 2025-09-07T10:18:56.6264697Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/sample/ops 2025-09-07T10:18:56.6265591Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/ops 2025-09-07T10:18:56.6266782Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops/bad_words.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/ops 2025-09-07T10:18:56.6268044Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops/logprobs.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/ops 2025-09-07T10:18:56.6269270Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops/penalties.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/ops 2025-09-07T10:18:56.6270545Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops/topk_topp_sampler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/ops 2025-09-07T10:18:56.6271536Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/sample/tpu 2025-09-07T10:18:56.6272413Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/tpu 2025-09-07T10:18:56.6273627Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu/metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/tpu 2025-09-07T10:18:56.6274849Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu/sampler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/tpu 2025-09-07T10:18:56.6275745Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/spec_decode 2025-09-07T10:18:56.6276635Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T10:18:56.6277811Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T10:18:56.6279049Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/medusa.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T10:18:56.6280260Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T10:18:56.6281465Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/metrics.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T10:18:56.6282818Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/ngram_proposer.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T10:18:56.6284012Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T10:18:56.6284892Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/structured_output 2025-09-07T10:18:56.6285868Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T10:18:56.6287245Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/backend_guidance.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T10:18:56.6288698Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/backend_lm_format_enforcer.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T10:18:56.6290146Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/backend_outlines.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T10:18:56.6291785Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/backend_types.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T10:18:56.6293280Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/backend_xgrammar.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T10:18:56.6294733Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/request.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T10:18:56.6296111Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T10:18:56.6297081Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/v1/worker 2025-09-07T10:18:56.6297950Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6299119Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/block_table.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6300336Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/cpu_model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6301563Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/cpu_worker.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6302771Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/gpu_input_batch.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6304178Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/gpu_model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6305298Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/gpu_worker.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6306510Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/kv_connector_model_runner_mixin.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6307774Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/lora_model_runner_mixin.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6308998Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/tpu_input_batch.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6310149Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/tpu_model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6311269Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/tpu_worker.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6312364Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6313459Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/worker_base.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6314588Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/xpu_model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6315720Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/xpu_worker.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T10:18:56.6316546Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/worker 2025-09-07T10:18:56.6317311Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/worker/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T10:18:56.6318369Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/worker/cache_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T10:18:56.6319472Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/worker/enc_dec_model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T10:18:56.6320584Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/worker/model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T10:18:56.6321676Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/worker/model_runner_base.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T10:18:56.6322766Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/worker/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T10:18:56.6323800Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/worker/worker.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T10:18:56.6324845Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/worker/worker_base.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T10:18:56.6325832Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/py.typed -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:18:56.6326603Z #34 11766.5 creating build/bdist.linux-x86_64/wheel/vllm/vllm_flash_attn 2025-09-07T10:18:56.6327467Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/.gitkeep -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn 2025-09-07T10:18:56.6328666Z #34 11766.5 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa2_C.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn 2025-09-07T10:18:57.3960187Z #34 11767.4 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn 2025-09-07T10:18:57.5485629Z #34 11767.4 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/flash_attn_interface.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn 2025-09-07T10:18:57.5486697Z #34 11767.4 creating build/bdist.linux-x86_64/wheel/vllm/vllm_flash_attn/layers 2025-09-07T10:18:57.5487722Z #34 11767.4 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn/layers 2025-09-07T10:18:57.5489091Z #34 11767.4 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/rotary.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn/layers 2025-09-07T10:18:57.5490081Z #34 11767.4 creating build/bdist.linux-x86_64/wheel/vllm/vllm_flash_attn/ops 2025-09-07T10:18:57.5491039Z #34 11767.4 creating build/bdist.linux-x86_64/wheel/vllm/vllm_flash_attn/ops/triton 2025-09-07T10:18:57.5492328Z #34 11767.4 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn/ops/triton 2025-09-07T10:18:57.5493804Z #34 11767.4 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/rotary.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn/ops/triton 2025-09-07T10:18:57.5495210Z #34 11767.4 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa3_C.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn 2025-09-07T10:19:01.8937155Z #34 11771.9 copying build/lib.linux-x86_64-cpython-312/vllm/_moe_C.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:19:02.9451439Z #34 11773.0 copying build/lib.linux-x86_64-cpython-312/vllm/_flashmla_C.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:19:03.1038707Z #34 11773.0 copying build/lib.linux-x86_64-cpython-312/vllm/cumem_allocator.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:19:03.1041468Z #34 11773.0 copying build/lib.linux-x86_64-cpython-312/vllm/_C.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T10:19:05.3652866Z #34 11775.4 running install_egg_info 2025-09-07T10:19:05.5414292Z #34 11775.4 Copying vllm.egg-info to build/bdist.linux-x86_64/wheel/./vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-py3.12.egg-info 2025-09-07T10:19:05.5415131Z #34 11775.4 running install_scripts 2025-09-07T10:19:05.5415840Z #34 11775.4 creating build/bdist.linux-x86_64/wheel/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129.dist-info/WHEEL 2025-09-07T10:19:05.5417074Z #34 11775.4 creating 'vllm-dist/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it 2025-09-07T10:19:23.6358949Z #34 11793.7 adding 'vllm/_C.abi3.so' 2025-09-07T10:19:24.3967578Z #34 11794.4 adding 'vllm/__init__.py' 2025-09-07T10:19:24.5954424Z #34 11794.4 adding 'vllm/_custom_ops.py' 2025-09-07T10:19:24.5954876Z #34 11794.5 adding 'vllm/_flashmla_C.abi3.so' 2025-09-07T10:19:24.5955267Z #34 11794.5 adding 'vllm/_ipex_ops.py' 2025-09-07T10:19:31.8272625Z #34 11801.9 adding 'vllm/_moe_C.abi3.so' 2025-09-07T10:19:32.1591474Z #34 11802.2 adding 'vllm/_version.py' 2025-09-07T10:19:32.2593931Z #34 11802.2 adding 'vllm/beam_search.py' 2025-09-07T10:19:32.2594324Z #34 11802.2 adding 'vllm/collect_env.py' 2025-09-07T10:19:32.2594695Z #34 11802.2 adding 'vllm/connections.py' 2025-09-07T10:19:32.2595079Z #34 11802.2 adding 'vllm/cumem_allocator.abi3.so' 2025-09-07T10:19:32.2595666Z #34 11802.2 adding 'vllm/env_override.py' 2025-09-07T10:19:32.2596025Z #34 11802.2 adding 'vllm/envs.py' 2025-09-07T10:19:32.2596379Z #34 11802.2 adding 'vllm/forward_context.py' 2025-09-07T10:19:32.2596733Z #34 11802.2 adding 'vllm/logger.py' 2025-09-07T10:19:32.2597091Z #34 11802.2 adding 'vllm/logits_process.py' 2025-09-07T10:19:32.2597469Z #34 11802.2 adding 'vllm/logprobs.py' 2025-09-07T10:19:32.2597957Z #34 11802.2 adding 'vllm/outputs.py' 2025-09-07T10:19:32.2598301Z #34 11802.2 adding 'vllm/pooling_params.py' 2025-09-07T10:19:32.2598669Z #34 11802.2 adding 'vllm/py.typed' 2025-09-07T10:19:32.2599023Z #34 11802.2 adding 'vllm/sampling_params.py' 2025-09-07T10:19:32.2599401Z #34 11802.2 adding 'vllm/scalar_type.py' 2025-09-07T10:19:32.2599745Z #34 11802.2 adding 'vllm/scripts.py' 2025-09-07T10:19:32.2600087Z #34 11802.2 adding 'vllm/sequence.py' 2025-09-07T10:19:32.2600430Z #34 11802.2 adding 'vllm/tasks.py' 2025-09-07T10:19:32.2600755Z #34 11802.2 adding 'vllm/test_utils.py' 2025-09-07T10:19:32.2601107Z #34 11802.2 adding 'vllm/tracing.py' 2025-09-07T10:19:32.2601428Z #34 11802.2 adding 'vllm/version.py' 2025-09-07T10:19:32.2601814Z #34 11802.2 adding 'vllm/adapter_commons/__init__.py' 2025-09-07T10:19:32.2602241Z #34 11802.2 adding 'vllm/adapter_commons/layers.py' 2025-09-07T10:19:32.2602673Z #34 11802.2 adding 'vllm/adapter_commons/models.py' 2025-09-07T10:19:32.2603164Z #34 11802.2 adding 'vllm/adapter_commons/request.py' 2025-09-07T10:19:32.2603600Z #34 11802.2 adding 'vllm/adapter_commons/utils.py' 2025-09-07T10:19:32.2604069Z #34 11802.2 adding 'vllm/adapter_commons/worker_manager.py' 2025-09-07T10:19:32.2604502Z #34 11802.2 adding 'vllm/assets/__init__.py' 2025-09-07T10:19:32.2604885Z #34 11802.2 adding 'vllm/assets/audio.py' 2025-09-07T10:19:32.2605239Z #34 11802.2 adding 'vllm/assets/base.py' 2025-09-07T10:19:32.2605604Z #34 11802.2 adding 'vllm/assets/image.py' 2025-09-07T10:19:32.2605955Z #34 11802.2 adding 'vllm/assets/video.py' 2025-09-07T10:19:32.2606342Z #34 11802.2 adding 'vllm/attention/__init__.py' 2025-09-07T10:19:32.2606721Z #34 11802.2 adding 'vllm/attention/layer.py' 2025-09-07T10:19:32.2607108Z #34 11802.2 adding 'vllm/attention/selector.py' 2025-09-07T10:19:32.2607544Z #34 11802.2 adding 'vllm/attention/backends/__init__.py' 2025-09-07T10:19:32.2607995Z #34 11802.2 adding 'vllm/attention/backends/abstract.py' 2025-09-07T10:19:32.2608543Z #34 11802.2 adding 'vllm/attention/backends/differential_flash_attn.py' 2025-09-07T10:19:32.2609129Z #34 11802.2 adding 'vllm/attention/backends/dual_chunk_flash_attn.py' 2025-09-07T10:19:32.2609728Z #34 11802.2 adding 'vllm/attention/backends/flash_attn.py' 2025-09-07T10:19:32.2610191Z #34 11802.2 adding 'vllm/attention/backends/flashmla.py' 2025-09-07T10:19:32.2610693Z #34 11802.2 adding 'vllm/attention/backends/placeholder_attn.py' 2025-09-07T10:19:32.2611541Z #34 11802.2 adding 'vllm/attention/backends/rocm_aiter_mla.py' 2025-09-07T10:19:32.2612057Z #34 11802.2 adding 'vllm/attention/backends/rocm_flash_attn.py' 2025-09-07T10:19:32.2612581Z #34 11802.2 adding 'vllm/attention/backends/triton_mla.py' 2025-09-07T10:19:32.2613046Z #34 11802.2 adding 'vllm/attention/backends/utils.py' 2025-09-07T10:19:32.2613518Z #34 11802.2 adding 'vllm/attention/backends/xformers.py' 2025-09-07T10:19:32.2614000Z #34 11802.2 adding 'vllm/attention/backends/mla/__init__.py' 2025-09-07T10:19:32.2614505Z #34 11802.2 adding 'vllm/attention/backends/mla/common.py' 2025-09-07T10:19:32.2614972Z #34 11802.2 adding 'vllm/attention/layers/__init__.py' 2025-09-07T10:19:32.2615500Z #34 11802.2 adding 'vllm/attention/layers/chunked_local_attention.py' 2025-09-07T10:19:32.2616097Z #34 11802.2 adding 'vllm/attention/layers/encoder_only_attention.py' 2025-09-07T10:19:32.2616602Z #34 11802.2 adding 'vllm/attention/ops/__init__.py' 2025-09-07T10:19:32.2617131Z #34 11802.2 adding 'vllm/attention/ops/chunked_prefill_paged_decode.py' 2025-09-07T10:19:32.2617648Z #34 11802.2 adding 'vllm/attention/ops/common.py' 2025-09-07T10:19:32.2618127Z #34 11802.2 adding 'vllm/attention/ops/flashmla.py' 2025-09-07T10:19:32.2618593Z #34 11802.2 adding 'vllm/attention/ops/merge_attn_states.py' 2025-09-07T10:19:32.2619082Z #34 11802.2 adding 'vllm/attention/ops/paged_attn.py' 2025-09-07T10:19:32.2619587Z #34 11802.2 adding 'vllm/attention/ops/pallas_kv_cache_update.py' 2025-09-07T10:19:32.2620102Z #34 11802.2 adding 'vllm/attention/ops/prefix_prefill.py' 2025-09-07T10:19:32.2620634Z #34 11802.2 adding 'vllm/attention/ops/rocm_aiter_mla.py' 2025-09-07T10:19:32.2621137Z #34 11802.2 adding 'vllm/attention/ops/rocm_aiter_paged_attn.py' 2025-09-07T10:19:32.2621702Z #34 11802.2 adding 'vllm/attention/ops/triton_decode_attention.py' 2025-09-07T10:19:32.2622256Z #34 11802.2 adding 'vllm/attention/ops/triton_flash_attention.py' 2025-09-07T10:19:32.2622941Z #34 11802.2 adding 'vllm/attention/ops/triton_merge_attn_states.py' 2025-09-07T10:19:32.2623498Z #34 11802.2 adding 'vllm/attention/ops/triton_unified_attention.py' 2025-09-07T10:19:32.2623991Z #34 11802.2 adding 'vllm/attention/utils/__init__.py' 2025-09-07T10:19:32.2624437Z #34 11802.2 adding 'vllm/attention/utils/fa_utils.py' 2025-09-07T10:19:32.2624900Z #34 11802.2 adding 'vllm/attention/utils/kv_sharing_utils.py' 2025-09-07T10:19:32.2625361Z #34 11802.2 adding 'vllm/benchmarks/__init__.py' 2025-09-07T10:19:32.2625765Z #34 11802.2 adding 'vllm/benchmarks/datasets.py' 2025-09-07T10:19:32.2626178Z #34 11802.2 adding 'vllm/benchmarks/latency.py' 2025-09-07T10:19:32.2626610Z #34 11802.2 adding 'vllm/benchmarks/serve.py' 2025-09-07T10:19:32.2627022Z #34 11802.2 adding 'vllm/benchmarks/throughput.py' 2025-09-07T10:19:32.2627463Z #34 11802.2 adding 'vllm/benchmarks/lib/__init__.py' 2025-09-07T10:19:32.2627944Z #34 11802.2 adding 'vllm/benchmarks/lib/endpoint_request_func.py' 2025-09-07T10:19:32.2628455Z #34 11802.2 adding 'vllm/benchmarks/lib/ready_checker.py' 2025-09-07T10:19:32.2628887Z #34 11802.2 adding 'vllm/benchmarks/lib/utils.py' 2025-09-07T10:19:32.2629313Z #34 11802.2 adding 'vllm/compilation/__init__.py' 2025-09-07T10:19:32.2629778Z #34 11802.2 adding 'vllm/compilation/activation_quant_fusion.py' 2025-09-07T10:19:32.2630257Z #34 11802.2 adding 'vllm/compilation/backends.py' 2025-09-07T10:19:32.2630713Z #34 11802.2 adding 'vllm/compilation/base_static_graph.py' 2025-09-07T10:19:32.2631182Z #34 11802.2 adding 'vllm/compilation/collective_fusion.py' 2025-09-07T10:19:32.2631678Z #34 11802.2 adding 'vllm/compilation/compiler_interface.py' 2025-09-07T10:19:32.2632144Z #34 11802.2 adding 'vllm/compilation/counter.py' 2025-09-07T10:19:32.2632554Z #34 11802.2 adding 'vllm/compilation/cuda_graph.py' 2025-09-07T10:19:32.2633064Z #34 11802.2 adding 'vllm/compilation/cuda_piecewise_backend.py' 2025-09-07T10:19:32.2633531Z #34 11802.2 adding 'vllm/compilation/decorators.py' 2025-09-07T10:19:32.2634005Z #34 11802.2 adding 'vllm/compilation/fix_functionalization.py' 2025-09-07T10:19:32.2634487Z #34 11802.2 adding 'vllm/compilation/fusion.py' 2025-09-07T10:19:32.2634905Z #34 11802.2 adding 'vllm/compilation/fusion_attn.py' 2025-09-07T10:19:32.2635341Z #34 11802.2 adding 'vllm/compilation/fx_utils.py' 2025-09-07T10:19:32.2635776Z #34 11802.2 adding 'vllm/compilation/inductor_pass.py' 2025-09-07T10:19:32.2636200Z #34 11802.2 adding 'vllm/compilation/monitor.py' 2025-09-07T10:19:32.2636656Z #34 11802.2 adding 'vllm/compilation/multi_output_match.py' 2025-09-07T10:19:32.2637133Z #34 11802.2 adding 'vllm/compilation/noop_elimination.py' 2025-09-07T10:19:32.2637606Z #34 11802.3 adding 'vllm/compilation/pass_manager.py' 2025-09-07T10:19:32.2638075Z #34 11802.3 adding 'vllm/compilation/sequence_parallelism.py' 2025-09-07T10:19:32.2638611Z #34 11802.3 adding 'vllm/compilation/torch25_custom_graph_pass.py' 2025-09-07T10:19:32.2639141Z #34 11802.3 adding 'vllm/compilation/vllm_inductor_pass.py' 2025-09-07T10:19:32.2639589Z #34 11802.3 adding 'vllm/compilation/wrapper.py' 2025-09-07T10:19:32.2639993Z #34 11802.3 adding 'vllm/config/__init__.py' 2025-09-07T10:19:32.2640352Z #34 11802.3 adding 'vllm/config/cache.py' 2025-09-07T10:19:32.2640739Z #34 11802.3 adding 'vllm/config/compilation.py' 2025-09-07T10:19:32.2641158Z #34 11802.3 adding 'vllm/config/parallel.py' 2025-09-07T10:19:32.2641546Z #34 11802.3 adding 'vllm/config/scheduler.py' 2025-09-07T10:19:32.2641913Z #34 11802.3 adding 'vllm/config/utils.py' 2025-09-07T10:19:32.2642280Z #34 11802.3 adding 'vllm/core/__init__.py' 2025-09-07T10:19:32.2642658Z #34 11802.3 adding 'vllm/core/block_manager.py' 2025-09-07T10:19:32.2643064Z #34 11802.3 adding 'vllm/core/evictor.py' 2025-09-07T10:19:32.2643441Z #34 11802.3 adding 'vllm/core/interfaces.py' 2025-09-07T10:19:32.2643887Z #34 11802.3 adding 'vllm/core/placeholder_block_space_manager.py' 2025-09-07T10:19:32.2644353Z #34 11802.3 adding 'vllm/core/scheduler.py' 2025-09-07T10:19:32.2644725Z #34 11802.3 adding 'vllm/core/block/__init__.py' 2025-09-07T10:19:32.2645134Z #34 11802.3 adding 'vllm/core/block/block_table.py' 2025-09-07T10:19:32.2645530Z #34 11802.3 adding 'vllm/core/block/common.py' 2025-09-07T10:19:32.2645978Z #34 11802.3 adding 'vllm/core/block/cpu_gpu_block_allocator.py' 2025-09-07T10:19:32.2646452Z #34 11802.3 adding 'vllm/core/block/interfaces.py' 2025-09-07T10:19:32.2646861Z #34 11802.3 adding 'vllm/core/block/naive_block.py' 2025-09-07T10:19:32.2647318Z #34 11802.3 adding 'vllm/core/block/prefix_caching_block.py' 2025-09-07T10:19:32.2647748Z #34 11802.3 adding 'vllm/core/block/utils.py' 2025-09-07T10:19:32.2648156Z #34 11802.3 adding 'vllm/device_allocator/__init__.py' 2025-09-07T10:19:32.2648613Z #34 11802.3 adding 'vllm/device_allocator/cumem.py' 2025-09-07T10:19:32.2649238Z #34 11802.3 adding 'vllm/distributed/__init__.py' 2025-09-07T10:19:32.2649852Z #34 11802.3 adding 'vllm/distributed/communication_op.py' 2025-09-07T10:19:32.2650320Z #34 11802.3 adding 'vllm/distributed/kv_events.py' 2025-09-07T10:19:32.2650852Z #34 11802.3 adding 'vllm/distributed/parallel_state.py' 2025-09-07T10:19:32.2651340Z #34 11802.3 adding 'vllm/distributed/tpu_distributed_utils.py' 2025-09-07T10:19:32.2651811Z #34 11802.3 adding 'vllm/distributed/utils.py' 2025-09-07T10:19:32.2652305Z #34 11802.3 adding 'vllm/distributed/device_communicators/__init__.py' 2025-09-07T10:19:32.2652909Z #34 11802.3 adding 'vllm/distributed/device_communicators/all2all.py' 2025-09-07T10:19:32.2653536Z #34 11802.3 adding 'vllm/distributed/device_communicators/all_reduce_utils.py' 2025-09-07T10:19:32.2654266Z #34 11802.3 adding 'vllm/distributed/device_communicators/base_device_communicator.py' 2025-09-07T10:19:32.2655004Z #34 11802.3 adding 'vllm/distributed/device_communicators/cpu_communicator.py' 2025-09-07T10:19:32.2655689Z #34 11802.3 adding 'vllm/distributed/device_communicators/cuda_communicator.py' 2025-09-07T10:19:32.2656444Z #34 11802.3 adding 'vllm/distributed/device_communicators/cuda_wrapper.py' 2025-09-07T10:19:32.2657107Z #34 11802.3 adding 'vllm/distributed/device_communicators/custom_all_reduce.py' 2025-09-07T10:19:32.2657762Z #34 11802.3 adding 'vllm/distributed/device_communicators/pynccl.py' 2025-09-07T10:19:32.2658399Z #34 11802.3 adding 'vllm/distributed/device_communicators/pynccl_wrapper.py' 2025-09-07T10:19:32.2659069Z #34 11802.3 adding 'vllm/distributed/device_communicators/quick_all_reduce.py' 2025-09-07T10:19:32.2659766Z #34 11802.3 adding 'vllm/distributed/device_communicators/ray_communicator.py' 2025-09-07T10:19:32.2660428Z #34 11802.3 adding 'vllm/distributed/device_communicators/shm_broadcast.py' 2025-09-07T10:19:32.2661059Z #34 11802.3 adding 'vllm/distributed/device_communicators/symm_mem.py' 2025-09-07T10:19:32.2661698Z #34 11802.3 adding 'vllm/distributed/device_communicators/tpu_communicator.py' 2025-09-07T10:19:32.2662389Z #34 11802.3 adding 'vllm/distributed/device_communicators/xpu_communicator.py' 2025-09-07T10:19:32.2663081Z #34 11802.3 adding 'vllm/distributed/eplb/__init__.py' 2025-09-07T10:19:32.2663525Z #34 11802.3 adding 'vllm/distributed/eplb/eplb_state.py' 2025-09-07T10:19:32.2664004Z #34 11802.3 adding 'vllm/distributed/eplb/rebalance_algo.py' 2025-09-07T10:19:32.2664498Z #34 11802.3 adding 'vllm/distributed/eplb/rebalance_execute.py' 2025-09-07T10:19:32.2665056Z #34 11802.3 adding 'vllm/distributed/kv_transfer/README.md' 2025-09-07T10:19:32.2665541Z #34 11802.3 adding 'vllm/distributed/kv_transfer/__init__.py' 2025-09-07T10:19:32.2666121Z #34 11802.3 adding 'vllm/distributed/kv_transfer/disagg_prefill_workflow.jpg' 2025-09-07T10:19:32.2666748Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_transfer_state.py' 2025-09-07T10:19:32.2667338Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/__init__.py' 2025-09-07T10:19:32.2667991Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/base.py' 2025-09-07T10:19:32.2668581Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/factory.py' 2025-09-07T10:19:32.2669187Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/utils.py' 2025-09-07T10:19:32.2669794Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/v1/__init__.py' 2025-09-07T10:19:32.2670424Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/v1/base.py' 2025-09-07T10:19:32.2671101Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py' 2025-09-07T10:19:32.2671814Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py' 2025-09-07T10:19:32.2672523Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py' 2025-09-07T10:19:32.2673264Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/v1/shared_storage_connector.py' 2025-09-07T10:19:32.2674081Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/v1/p2p/__init__.py' 2025-09-07T10:19:32.2674795Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py' 2025-09-07T10:19:32.3602013Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_engine.py' 2025-09-07T10:19:32.3602826Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool.py' 2025-09-07T10:19:32.3603598Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_lookup_buffer/__init__.py' 2025-09-07T10:19:32.3604234Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_lookup_buffer/base.py' 2025-09-07T10:19:32.3604900Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_lookup_buffer/mooncake_store.py' 2025-09-07T10:19:32.3605601Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_lookup_buffer/simple_buffer.py' 2025-09-07T10:19:32.3606242Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_pipe/__init__.py' 2025-09-07T10:19:32.3606816Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_pipe/base.py' 2025-09-07T10:19:32.3607384Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py' 2025-09-07T10:19:32.3608585Z #34 11802.3 adding 'vllm/distributed/kv_transfer/kv_pipe/pynccl_pipe.py' 2025-09-07T10:19:32.3609097Z #34 11802.3 adding 'vllm/engine/__init__.py' 2025-09-07T10:19:32.3609482Z #34 11802.3 adding 'vllm/engine/arg_utils.py' 2025-09-07T10:19:32.3609881Z #34 11802.3 adding 'vllm/engine/async_llm_engine.py' 2025-09-07T10:19:32.3610302Z #34 11802.3 adding 'vllm/engine/async_timeout.py' 2025-09-07T10:19:32.3610712Z #34 11802.3 adding 'vllm/engine/llm_engine.py' 2025-09-07T10:19:32.3611224Z #34 11802.3 adding 'vllm/engine/metrics.py' 2025-09-07T10:19:32.3611616Z #34 11802.3 adding 'vllm/engine/metrics_types.py' 2025-09-07T10:19:32.3612187Z #34 11802.3 adding 'vllm/engine/protocol.py' 2025-09-07T10:19:32.3612643Z #34 11802.3 adding 'vllm/engine/multiprocessing/__init__.py' 2025-09-07T10:19:32.3613153Z #34 11802.3 adding 'vllm/engine/multiprocessing/client.py' 2025-09-07T10:19:32.3613656Z #34 11802.3 adding 'vllm/engine/multiprocessing/engine.py' 2025-09-07T10:19:32.3614151Z #34 11802.3 adding 'vllm/engine/output_processor/__init__.py' 2025-09-07T10:19:32.3614687Z #34 11802.3 adding 'vllm/engine/output_processor/interfaces.py' 2025-09-07T10:19:32.3615235Z #34 11802.3 adding 'vllm/engine/output_processor/single_step.py' 2025-09-07T10:19:32.3615779Z #34 11802.3 adding 'vllm/engine/output_processor/stop_checker.py' 2025-09-07T10:19:32.3616304Z #34 11802.3 adding 'vllm/engine/output_processor/util.py' 2025-09-07T10:19:32.3616828Z #34 11802.3 adding 'vllm/entrypoints/__init__.py' 2025-09-07T10:19:32.3617278Z #34 11802.3 adding 'vllm/entrypoints/api_server.py' 2025-09-07T10:19:32.3617721Z #34 11802.3 adding 'vllm/entrypoints/chat_utils.py' 2025-09-07T10:19:32.3618168Z #34 11802.3 adding 'vllm/entrypoints/constants.py' 2025-09-07T10:19:32.3618606Z #34 11802.3 adding 'vllm/entrypoints/context.py' 2025-09-07T10:19:32.3619092Z #34 11802.3 adding 'vllm/entrypoints/harmony_utils.py' 2025-09-07T10:19:32.3619542Z #34 11802.3 adding 'vllm/entrypoints/launcher.py' 2025-09-07T10:19:32.3619947Z #34 11802.3 adding 'vllm/entrypoints/llm.py' 2025-09-07T10:19:32.3620359Z #34 11802.3 adding 'vllm/entrypoints/logger.py' 2025-09-07T10:19:32.3620770Z #34 11802.3 adding 'vllm/entrypoints/renderer.py' 2025-09-07T10:19:32.3621219Z #34 11802.3 adding 'vllm/entrypoints/score_utils.py' 2025-09-07T10:19:32.3621636Z #34 11802.3 adding 'vllm/entrypoints/ssl.py' 2025-09-07T10:19:32.3622037Z #34 11802.3 adding 'vllm/entrypoints/tool.py' 2025-09-07T10:19:32.3622576Z #34 11802.3 adding 'vllm/entrypoints/tool_server.py' 2025-09-07T10:19:32.3622989Z #34 11802.3 adding 'vllm/entrypoints/utils.py' 2025-09-07T10:19:32.3623407Z #34 11802.3 adding 'vllm/entrypoints/cli/__init__.py' 2025-09-07T10:19:32.3623842Z #34 11802.3 adding 'vllm/entrypoints/cli/collect_env.py' 2025-09-07T10:19:32.3624284Z #34 11802.3 adding 'vllm/entrypoints/cli/main.py' 2025-09-07T10:19:32.3624744Z #34 11802.3 adding 'vllm/entrypoints/cli/openai.py' 2025-09-07T10:19:32.3625185Z #34 11802.3 adding 'vllm/entrypoints/cli/run_batch.py' 2025-09-07T10:19:32.3625602Z #34 11802.3 adding 'vllm/entrypoints/cli/serve.py' 2025-09-07T10:19:32.3626028Z #34 11802.3 adding 'vllm/entrypoints/cli/types.py' 2025-09-07T10:19:32.3626500Z #34 11802.3 adding 'vllm/entrypoints/cli/benchmark/__init__.py' 2025-09-07T10:19:32.3626991Z #34 11802.3 adding 'vllm/entrypoints/cli/benchmark/base.py' 2025-09-07T10:19:32.3627492Z #34 11802.3 adding 'vllm/entrypoints/cli/benchmark/latency.py' 2025-09-07T10:19:32.3627983Z #34 11802.3 adding 'vllm/entrypoints/cli/benchmark/main.py' 2025-09-07T10:19:32.3628477Z #34 11802.3 adding 'vllm/entrypoints/cli/benchmark/serve.py' 2025-09-07T10:19:32.3628988Z #34 11802.3 adding 'vllm/entrypoints/cli/benchmark/throughput.py' 2025-09-07T10:19:32.3629492Z #34 11802.3 adding 'vllm/entrypoints/openai/__init__.py' 2025-09-07T10:19:32.3629962Z #34 11802.3 adding 'vllm/entrypoints/openai/api_server.py' 2025-09-07T10:19:32.3630432Z #34 11802.3 adding 'vllm/entrypoints/openai/cli_args.py' 2025-09-07T10:19:32.3630930Z #34 11802.3 adding 'vllm/entrypoints/openai/logits_processors.py' 2025-09-07T10:19:32.3631456Z #34 11802.3 adding 'vllm/entrypoints/openai/protocol.py' 2025-09-07T10:19:32.3631928Z #34 11802.3 adding 'vllm/entrypoints/openai/run_batch.py' 2025-09-07T10:19:32.3632399Z #34 11802.3 adding 'vllm/entrypoints/openai/serving_chat.py' 2025-09-07T10:19:32.3632945Z #34 11802.3 adding 'vllm/entrypoints/openai/serving_classification.py' 2025-09-07T10:19:32.3633527Z #34 11802.3 adding 'vllm/entrypoints/openai/serving_completion.py' 2025-09-07T10:19:32.3634062Z #34 11802.3 adding 'vllm/entrypoints/openai/serving_embedding.py' 2025-09-07T10:19:32.3634588Z #34 11802.3 adding 'vllm/entrypoints/openai/serving_engine.py' 2025-09-07T10:19:32.3635086Z #34 11802.3 adding 'vllm/entrypoints/openai/serving_models.py' 2025-09-07T10:19:32.3635603Z #34 11802.3 adding 'vllm/entrypoints/openai/serving_pooling.py' 2025-09-07T10:19:32.3636123Z #34 11802.3 adding 'vllm/entrypoints/openai/serving_responses.py' 2025-09-07T10:19:32.3636648Z #34 11802.3 adding 'vllm/entrypoints/openai/serving_score.py' 2025-09-07T10:19:32.3637188Z #34 11802.3 adding 'vllm/entrypoints/openai/serving_tokenization.py' 2025-09-07T10:19:32.3637760Z #34 11802.3 adding 'vllm/entrypoints/openai/serving_transcription.py' 2025-09-07T10:19:32.3638313Z #34 11802.3 adding 'vllm/entrypoints/openai/speech_to_text.py' 2025-09-07T10:19:32.3638843Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/__init__.py' 2025-09-07T10:19:32.3639513Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/abstract_tool_parser.py' 2025-09-07T10:19:32.3640223Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/deepseekv31_tool_parser.py' 2025-09-07T10:19:32.3640951Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/deepseekv3_tool_parser.py' 2025-09-07T10:19:32.3641661Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/glm4_moe_tool_parser.py' 2025-09-07T10:19:32.3642402Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/granite_20b_fc_tool_parser.py' 2025-09-07T10:19:32.3643124Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/granite_tool_parser.py' 2025-09-07T10:19:32.3643790Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py' 2025-09-07T10:19:32.3644494Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py' 2025-09-07T10:19:32.3645201Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/internlm2_tool_parser.py' 2025-09-07T10:19:32.3645900Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/jamba_tool_parser.py' 2025-09-07T10:19:32.3646573Z #34 11802.3 adding 'vllm/entrypoints/openai/tool_parsers/kimi_k2_tool_parser.py' 2025-09-07T10:19:32.3647287Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/llama4_pythonic_tool_parser.py' 2025-09-07T10:19:32.3648003Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/llama_tool_parser.py' 2025-09-07T10:19:32.3648922Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/minimax_tool_parser.py' 2025-09-07T10:19:32.3649794Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py' 2025-09-07T10:19:32.3650501Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py' 2025-09-07T10:19:32.3651258Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py' 2025-09-07T10:19:32.3651982Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py' 2025-09-07T10:19:32.3652705Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/qwen3coder_tool_parser.py' 2025-09-07T10:19:32.3653436Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/seed_oss_tool_parser.py' 2025-09-07T10:19:32.3654124Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/step3_tool_parser.py' 2025-09-07T10:19:32.3654754Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/utils.py' 2025-09-07T10:19:32.3655376Z #34 11802.4 adding 'vllm/entrypoints/openai/tool_parsers/xlam_tool_parser.py' 2025-09-07T10:19:32.3655916Z #34 11802.4 adding 'vllm/executor/__init__.py' 2025-09-07T10:19:32.3656337Z #34 11802.4 adding 'vllm/executor/executor_base.py' 2025-09-07T10:19:32.3656876Z #34 11802.4 adding 'vllm/executor/mp_distributed_executor.py' 2025-09-07T10:19:32.3657365Z #34 11802.4 adding 'vllm/executor/msgspec_utils.py' 2025-09-07T10:19:32.3657834Z #34 11802.4 adding 'vllm/executor/multiproc_worker_utils.py' 2025-09-07T10:19:32.3658355Z #34 11802.4 adding 'vllm/executor/ray_distributed_executor.py' 2025-09-07T10:19:32.3658835Z #34 11802.4 adding 'vllm/executor/ray_utils.py' 2025-09-07T10:19:32.3659264Z #34 11802.4 adding 'vllm/executor/uniproc_executor.py' 2025-09-07T10:19:32.3659699Z #34 11802.4 adding 'vllm/inputs/__init__.py' 2025-09-07T10:19:32.3660075Z #34 11802.4 adding 'vllm/inputs/data.py' 2025-09-07T10:19:32.3660453Z #34 11802.4 adding 'vllm/inputs/parse.py' 2025-09-07T10:19:32.3660840Z #34 11802.4 adding 'vllm/inputs/preprocess.py' 2025-09-07T10:19:32.3661253Z #34 11802.4 adding 'vllm/inputs/registry.py' 2025-09-07T10:19:32.3661654Z #34 11802.4 adding 'vllm/logging_utils/__init__.py' 2025-09-07T10:19:32.3662105Z #34 11802.4 adding 'vllm/logging_utils/dump_input.py' 2025-09-07T10:19:32.3662674Z #34 11802.4 adding 'vllm/logging_utils/formatter.py' 2025-09-07T10:19:32.3663068Z #34 11802.4 adding 'vllm/lora/__init__.py' 2025-09-07T10:19:32.3663472Z #34 11802.4 adding 'vllm/lora/fully_sharded_layers.py' 2025-09-07T10:19:32.3663864Z #34 11802.4 adding 'vllm/lora/layers.py' 2025-09-07T10:19:32.3664217Z #34 11802.4 adding 'vllm/lora/lora.py' 2025-09-07T10:19:32.3664604Z #34 11802.4 adding 'vllm/lora/models.py' 2025-09-07T10:19:32.3664979Z #34 11802.4 adding 'vllm/lora/peft_helper.py' 2025-09-07T10:19:32.3665348Z #34 11802.4 adding 'vllm/lora/request.py' 2025-09-07T10:19:32.3665728Z #34 11802.4 adding 'vllm/lora/resolver.py' 2025-09-07T10:19:32.3666096Z #34 11802.4 adding 'vllm/lora/utils.py' 2025-09-07T10:19:32.3666472Z #34 11802.4 adding 'vllm/lora/worker_manager.py' 2025-09-07T10:19:32.3666923Z #34 11802.4 adding 'vllm/lora/ops/__init__.py' 2025-09-07T10:19:32.3667330Z #34 11802.4 adding 'vllm/lora/ops/ipex_ops/__init__.py' 2025-09-07T10:19:32.3667784Z #34 11802.4 adding 'vllm/lora/ops/ipex_ops/lora_ops.py' 2025-09-07T10:19:32.3668225Z #34 11802.4 adding 'vllm/lora/ops/torch_ops/__init__.py' 2025-09-07T10:19:32.3668685Z #34 11802.4 adding 'vllm/lora/ops/torch_ops/lora_ops.py' 2025-09-07T10:19:32.3669136Z #34 11802.4 adding 'vllm/lora/ops/triton_ops/__init__.py' 2025-09-07T10:19:32.3669621Z #34 11802.4 adding 'vllm/lora/ops/triton_ops/kernel_utils.py' 2025-09-07T10:19:32.3670133Z #34 11802.4 adding 'vllm/lora/ops/triton_ops/lora_expand_op.py' 2025-09-07T10:19:32.3670668Z #34 11802.4 adding 'vllm/lora/ops/triton_ops/lora_kernel_metadata.py' 2025-09-07T10:19:32.3671213Z #34 11802.4 adding 'vllm/lora/ops/triton_ops/lora_shrink_op.py' 2025-09-07T10:19:32.3671677Z #34 11802.4 adding 'vllm/lora/ops/triton_ops/utils.py' 2025-09-07T10:19:32.3672164Z #34 11802.4 adding 'vllm/lora/ops/xla_ops/__init__.py' 2025-09-07T10:19:32.3672591Z #34 11802.4 adding 'vllm/lora/ops/xla_ops/lora_ops.py' 2025-09-07T10:19:32.3673045Z #34 11802.4 adding 'vllm/lora/punica_wrapper/__init__.py' 2025-09-07T10:19:32.3673533Z #34 11802.4 adding 'vllm/lora/punica_wrapper/punica_base.py' 2025-09-07T10:19:32.3674039Z #34 11802.4 adding 'vllm/lora/punica_wrapper/punica_cpu.py' 2025-09-07T10:19:32.3674539Z #34 11802.4 adding 'vllm/lora/punica_wrapper/punica_gpu.py' 2025-09-07T10:19:32.3675042Z #34 11802.4 adding 'vllm/lora/punica_wrapper/punica_selector.py' 2025-09-07T10:19:32.3675561Z #34 11802.4 adding 'vllm/lora/punica_wrapper/punica_tpu.py' 2025-09-07T10:19:32.3676054Z #34 11802.4 adding 'vllm/lora/punica_wrapper/punica_xpu.py' 2025-09-07T10:19:32.3676512Z #34 11802.4 adding 'vllm/lora/punica_wrapper/utils.py' 2025-09-07T10:19:32.3676959Z #34 11802.4 adding 'vllm/model_executor/__init__.py' 2025-09-07T10:19:32.3677382Z #34 11802.4 adding 'vllm/model_executor/custom_op.py' 2025-09-07T10:19:32.3677826Z #34 11802.4 adding 'vllm/model_executor/parameter.py' 2025-09-07T10:19:32.3678286Z #34 11802.4 adding 'vllm/model_executor/sampling_metadata.py' 2025-09-07T10:19:32.3678749Z #34 11802.4 adding 'vllm/model_executor/utils.py' 2025-09-07T10:19:32.3679255Z #34 11802.4 adding 'vllm/model_executor/layers/__init__.py' 2025-09-07T10:19:32.3679746Z #34 11802.4 adding 'vllm/model_executor/layers/activation.py' 2025-09-07T10:19:32.3680298Z #34 11802.4 adding 'vllm/model_executor/layers/attention_layer_base.py' 2025-09-07T10:19:32.3680838Z #34 11802.4 adding 'vllm/model_executor/layers/layernorm.py' 2025-09-07T10:19:32.3681360Z #34 11802.4 adding 'vllm/model_executor/layers/lightning_attn.py' 2025-09-07T10:19:32.3681850Z #34 11802.4 adding 'vllm/model_executor/layers/linear.py' 2025-09-07T10:19:32.3682367Z #34 11802.4 adding 'vllm/model_executor/layers/logits_processor.py' 2025-09-07T10:19:32.3682871Z #34 11802.4 adding 'vllm/model_executor/layers/mla.py' 2025-09-07T10:19:32.3683318Z #34 11802.4 adding 'vllm/model_executor/layers/pooler.py' 2025-09-07T10:19:32.3683806Z #34 11802.4 adding 'vllm/model_executor/layers/resampler.py' 2025-09-07T10:19:32.3684288Z #34 11802.4 adding 'vllm/model_executor/layers/sampler.py' 2025-09-07T10:19:32.3684760Z #34 11802.4 adding 'vllm/model_executor/layers/utils.py' 2025-09-07T10:19:32.3685286Z #34 11802.4 adding 'vllm/model_executor/layers/vocab_parallel_embedding.py' 2025-09-07T10:19:32.3685880Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/__init__.py' 2025-09-07T10:19:32.3686488Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py' 2025-09-07T10:19:32.3687251Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/batched_triton_or_deep_gemm_moe.py' 2025-09-07T10:19:32.3687925Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/config.py' 2025-09-07T10:19:32.3688487Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/cpu_fused_moe.py' 2025-09-07T10:19:32.3689092Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/cutlass_moe.py' 2025-09-07T10:19:32.3689995Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/deep_gemm_moe.py' 2025-09-07T10:19:32.3690617Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/deep_gemm_utils.py' 2025-09-07T10:19:32.3691683Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py' 2025-09-07T10:19:32.3692439Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/deepep_ll_prepare_finalize.py' 2025-09-07T10:19:32.3693185Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py' 2025-09-07T10:19:32.3693970Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py' 2025-09-07T10:19:32.3694733Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/fused_batched_moe.py' 2025-09-07T10:19:32.3695382Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/fused_marlin_moe.py' 2025-09-07T10:19:32.3696007Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/fused_moe.py' 2025-09-07T10:19:32.4606360Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py' 2025-09-07T10:19:32.4607169Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/layer.py' 2025-09-07T10:19:32.4607824Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/modular_kernel.py' 2025-09-07T10:19:32.4608481Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/moe_align_block_size.py' 2025-09-07T10:19:32.4609102Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/moe_pallas.py' 2025-09-07T10:19:32.4609741Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py' 2025-09-07T10:19:32.4610409Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/moe_torch_iterative.py' 2025-09-07T10:19:32.4611367Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/pplx_prepare_finalize.py' 2025-09-07T10:19:32.4612057Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/prepare_finalize.py' 2025-09-07T10:19:32.4612722Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py' 2025-09-07T10:19:32.4613407Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/routing_simulator.py' 2025-09-07T10:19:32.4614224Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/topk_weight_and_reduce.py' 2025-09-07T10:19:32.4614934Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/triton_deep_gemm_moe.py' 2025-09-07T10:19:32.4615580Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/trtllm_moe.py' 2025-09-07T10:19:32.4616150Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/utils.py' 2025-09-07T10:19:32.4617023Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4618109Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4619197Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4620280Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4621346Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4622493Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4623658Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4624760Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4625804Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4626836Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4627925Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4628954Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4629991Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4630969Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4631940Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4632882Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20-3e.json' 2025-09-07T10:19:32.4633786Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20.json' 2025-09-07T10:19:32.4634715Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4635714Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=352,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4636899Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4638144Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4639359Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4640415Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e.json' 2025-09-07T10:19:32.4641317Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20.json' 2025-09-07T10:19:32.4642416Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4643468Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4644406Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=512,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4645399Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_B200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4646462Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4647659Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4649105Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4650515Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4651821Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4652973Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20.json' 2025-09-07T10:19:32.4654057Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4655124Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4656086Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=96,device_name=NVIDIA_H20.json' 2025-09-07T10:19:32.4657031Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4658059Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4659047Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200.json' 2025-09-07T10:19:32.4659956Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_H100.json' 2025-09-07T10:19:32.4660927Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json' 2025-09-07T10:19:32.4661919Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4663074Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4664130Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4665181Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4666232Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4667280Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4668240Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4669208Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4670245Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4671399Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4672513Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=3200,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4673605Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4674652Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4675678Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=6400,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4676788Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4677842Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4678873Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json' 2025-09-07T10:19:32.4679969Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=800,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4681030Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_A800-SXM4-80GB.json' 2025-09-07T10:19:32.4681969Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_H20-3e.json' 2025-09-07T10:19:32.4682877Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=320,device_name=NVIDIA_H20-3e.json' 2025-09-07T10:19:32.4683952Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4685158Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4686364Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4687554Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4688750Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4689933Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4691449Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4692678Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325X,block_shape=[128,128].json' 2025-09-07T10:19:32.4693937Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4695299Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4696551Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8.json' 2025-09-07T10:19:32.4697788Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4699045Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8.json' 2025-09-07T10:19:32.4700322Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4701592Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4702823Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4704240Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4705516Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4706812Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4708047Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4709239Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4710459Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4711656Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4712868Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4714081Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4715326Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4716462Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=64,device_name=NVIDIA_A800-SXM4-80GB.json' 2025-09-07T10:19:32.4717541Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4718791Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4719992Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4721106Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=60,N=1408,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4722056Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=60,N=176,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4723012Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=60,N=352,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4723965Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=60,N=704,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4724917Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=62,N=256,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4725882Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=62,N=512,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4726841Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4727817Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A800-SXM4-80GB.json' 2025-09-07T10:19:32.4728879Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4729905Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4730976Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4732113Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4733097Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4734191Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4735270Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4736262Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4737232Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4738237Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20.json' 2025-09-07T10:19:32.4739257Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4740306Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4741321Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4742315Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4743398Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4744333Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20.json' 2025-09-07T10:19:32.4745245Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4746213Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A800-SXM4-80GB.json' 2025-09-07T10:19:32.4747258Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4748339Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4749763Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4750779Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4751756Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4752732Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4753690Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20.json' 2025-09-07T10:19:32.4754601Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=896,device_name=NVIDIA_H20.json' 2025-09-07T10:19:32.4755542Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=72,N=384,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4756544Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=72,N=768,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4757679Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4758731Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4759799Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4760851Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X.json' 2025-09-07T10:19:32.4762017Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4763081Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4764025Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4765020Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4766040Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4767105Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4768133Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X.json' 2025-09-07T10:19:32.4769140Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4770202Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4771283Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4772490Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X.json' 2025-09-07T10:19:32.4773486Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json' 2025-09-07T10:19:32.4774478Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4775479Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4776498Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4777519Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4778537Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4779572Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4780625Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4781674Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X.json' 2025-09-07T10:19:32.4782657Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4783824Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4784845Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4785958Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.4787063Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4787999Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4788983Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4790002Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4791006Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4792024Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X.json' 2025-09-07T10:19:32.4792978Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json' 2025-09-07T10:19:32.4793949Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4794993Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4796109Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4797143Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4798123Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4799120Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4800011Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_L40S.json' 2025-09-07T10:19:32.4800985Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4802003Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4803007Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4804027Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X.json' 2025-09-07T10:19:32.4804996Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4806045Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4807077Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4808052Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4808999Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4809980Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4811060Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4812290Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4813344Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X.json' 2025-09-07T10:19:32.4814362Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T10:19:32.4815437Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4816490Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T10:19:32.4817508Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4818482Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200.json' 2025-09-07T10:19:32.4819488Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4820540Z #34 11802.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X.json' 2025-09-07T10:19:32.4821580Z #34 11802.5 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4822629Z #34 11802.5 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X.json' 2025-09-07T10:19:32.4823799Z #34 11802.5 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4824835Z #34 11802.5 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T10:19:32.4825632Z #34 11802.5 adding 'vllm/model_executor/layers/fused_moe/configs/README' 2025-09-07T10:19:32.4826185Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/__init__.py' 2025-09-07T10:19:32.4826756Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/abstract.py' 2025-09-07T10:19:32.4827301Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/linear_attn.py' 2025-09-07T10:19:32.4827869Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/mamba2_metadata.py' 2025-09-07T10:19:32.4828447Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/mamba_mixer.py' 2025-09-07T10:19:32.4828999Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/mamba_mixer2.py' 2025-09-07T10:19:32.4829561Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/mamba_utils.py' 2025-09-07T10:19:32.4830106Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/short_conv.py' 2025-09-07T10:19:32.4830670Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/ops/__init__.py' 2025-09-07T10:19:32.4831260Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/ops/causal_conv1d.py' 2025-09-07T10:19:32.4831870Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/ops/layernorm_gated.py' 2025-09-07T10:19:32.4832507Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/ops/mamba_ssm.py' 2025-09-07T10:19:32.4833060Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/ops/ssd_bmm.py' 2025-09-07T10:19:32.4833645Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/ops/ssd_chunk_scan.py' 2025-09-07T10:19:32.4834255Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/ops/ssd_chunk_state.py' 2025-09-07T10:19:32.4834870Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/ops/ssd_combined.py' 2025-09-07T10:19:32.4835498Z #34 11802.5 adding 'vllm/model_executor/layers/mamba/ops/ssd_state_passing.py' 2025-09-07T10:19:32.4836114Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/__init__.py' 2025-09-07T10:19:32.4836718Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/auto_round.py' 2025-09-07T10:19:32.4837290Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/awq.py' 2025-09-07T10:19:32.4837872Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/awq_marlin.py' 2025-09-07T10:19:32.4838474Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/awq_triton.py' 2025-09-07T10:19:32.4839088Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/base_config.py' 2025-09-07T10:19:32.4839724Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/bitblas.py' 2025-09-07T10:19:32.4840331Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/bitsandbytes.py' 2025-09-07T10:19:32.4840954Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/deepgemm.py' 2025-09-07T10:19:32.4841558Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/deepspeedfp.py' 2025-09-07T10:19:32.4842192Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/experts_int8.py' 2025-09-07T10:19:32.4842816Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/fbgemm_fp8.py' 2025-09-07T10:19:32.4843392Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/fp8.py' 2025-09-07T10:19:32.4843957Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/gguf.py' 2025-09-07T10:19:32.4844516Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/gptq.py' 2025-09-07T10:19:32.4845122Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/gptq_bitblas.py' 2025-09-07T10:19:32.4845744Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/gptq_marlin.py' 2025-09-07T10:19:32.4846384Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/gptq_marlin_24.py' 2025-09-07T10:19:32.4847019Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/hqq_marlin.py' 2025-09-07T10:19:32.4847590Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/inc.py' 2025-09-07T10:19:32.4848226Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/input_quant_fp8.py' 2025-09-07T10:19:32.4849039Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/ipex_quant.py' 2025-09-07T10:19:32.4849841Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kv_cache.py' 2025-09-07T10:19:32.4850449Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/modelopt.py' 2025-09-07T10:19:32.4851231Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/moe_wna16.py' 2025-09-07T10:19:32.4851845Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/mxfp4.py' 2025-09-07T10:19:32.4852429Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/petit.py' 2025-09-07T10:19:32.4853038Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/ptpc_fp8.py' 2025-09-07T10:19:32.4853626Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/rtn.py' 2025-09-07T10:19:32.4854211Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/schema.py' 2025-09-07T10:19:32.4854804Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/torchao.py' 2025-09-07T10:19:32.4855415Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/tpu_int8.py' 2025-09-07T10:19:32.4856133Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/__init__.py' 2025-09-07T10:19:32.4856995Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py' 2025-09-07T10:19:32.4858010Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py' 2025-09-07T10:19:32.4858941Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/triton_scaled_mm.py' 2025-09-07T10:19:32.4859783Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/utils.py' 2025-09-07T10:19:32.4860624Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/__init__.py' 2025-09-07T10:19:32.4861595Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_24.py' 2025-09-07T10:19:32.4862777Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_scheme.py' 2025-09-07T10:19:32.4863839Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_24.py' 2025-09-07T10:19:32.4864938Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_nvfp4.py' 2025-09-07T10:19:32.4866083Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a4_nvfp4.py' 2025-09-07T10:19:32.4867159Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_fp8.py' 2025-09-07T10:19:32.4868230Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_int.py' 2025-09-07T10:19:32.4869301Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py' 2025-09-07T10:19:32.4870378Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py' 2025-09-07T10:19:32.4871461Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py' 2025-09-07T10:19:32.4872520Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py' 2025-09-07T10:19:32.4873485Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/transform/linear.py' 2025-09-07T10:19:32.4874350Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/transform/module.py' 2025-09-07T10:19:32.4875225Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/transform/utils.py' 2025-09-07T10:19:32.4876255Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes/linear_qutlass_nvfp4.py' 2025-09-07T10:19:32.4877135Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/__init__.py' 2025-09-07T10:19:32.4877936Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/MPLinearKernel.py' 2025-09-07T10:19:32.4878809Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/__init__.py' 2025-09-07T10:19:32.4879692Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/allspark.py' 2025-09-07T10:19:32.4880543Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/bitblas.py' 2025-09-07T10:19:32.4881357Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/conch.py' 2025-09-07T10:19:32.4882183Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/cutlass.py' 2025-09-07T10:19:32.4883033Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/dynamic_4bit.py' 2025-09-07T10:19:32.4883895Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/exllama.py' 2025-09-07T10:19:32.4884729Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/machete.py' 2025-09-07T10:19:32.4885547Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/marlin.py' 2025-09-07T10:19:32.4886452Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/ScaledMMLinearKernel.py' 2025-09-07T10:19:32.4887283Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/__init__.py' 2025-09-07T10:19:32.4888039Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/aiter.py' 2025-09-07T10:19:32.4888774Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu.py' 2025-09-07T10:19:32.4889501Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/cutlass.py' 2025-09-07T10:19:32.4890254Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py' 2025-09-07T10:19:32.4891082Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/xla.py' 2025-09-07T10:19:32.4891968Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/quark/__init__.py' 2025-09-07T10:19:32.4892610Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/quark/quark.py' 2025-09-07T10:19:32.4893281Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/quark/quark_moe.py' 2025-09-07T10:19:32.4893940Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/quark/utils.py' 2025-09-07T10:19:32.4894662Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/quark/schemes/__init__.py' 2025-09-07T10:19:32.4895436Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/quark/schemes/quark_scheme.py' 2025-09-07T10:19:32.4896246Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/quark/schemes/quark_w4a4_mxfp4.py' 2025-09-07T10:19:32.4897079Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8.py' 2025-09-07T10:19:32.4897905Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_int8.py' 2025-09-07T10:19:32.4898644Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/__init__.py' 2025-09-07T10:19:32.4899341Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/allspark_utils.py' 2025-09-07T10:19:32.4900061Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/bitblas_utils.py' 2025-09-07T10:19:32.4900806Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py' 2025-09-07T10:19:32.4901554Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/flashinfer_utils.py' 2025-09-07T10:19:32.4902271Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/fp8_utils.py' 2025-09-07T10:19:32.4902953Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/gptq_utils.py' 2025-09-07T10:19:32.4903770Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/int8_utils.py' 2025-09-07T10:19:32.5617888Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/layer_utils.py' 2025-09-07T10:19:32.5618644Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/machete_utils.py' 2025-09-07T10:19:32.5619361Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/marlin_utils.py' 2025-09-07T10:19:32.5620327Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/marlin_utils_fp4.py' 2025-09-07T10:19:32.5621077Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py' 2025-09-07T10:19:32.5621844Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/marlin_utils_test.py' 2025-09-07T10:19:32.5622610Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/marlin_utils_test_24.py' 2025-09-07T10:19:32.5623359Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/mxfp4_utils.py' 2025-09-07T10:19:32.5624065Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/mxfp8_utils.py' 2025-09-07T10:19:32.5624807Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py' 2025-09-07T10:19:32.5625598Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/nvfp4_moe_support.py' 2025-09-07T10:19:32.5626314Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/petit_utils.py' 2025-09-07T10:19:32.5627072Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/quant_utils.py' 2025-09-07T10:19:32.5627948Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/w8a8_utils.py' 2025-09-07T10:19:32.5629008Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5630439Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5631844Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5633377Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5634838Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5636357Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5637826Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5639286Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5640745Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5642108Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5643521Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5644949Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5646629Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5648090Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5649928Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5651705Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5653147Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5654559Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5656057Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5657453Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5658961Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5660509Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5661984Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5663580Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5665025Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5666477Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5667984Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5669331Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5670694Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5672100Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5673498Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5675015Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5676458Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5677898Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5679409Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5680878Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5682322Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5683760Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5685112Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5686479Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5688062Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5689632Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5691346Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5692853Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5694621Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5696154Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5697678Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5699145Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5700668Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5702115Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5703797Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5705173Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5706532Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5707975Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5709533Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5710995Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5712627Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5714093Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5715593Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5717039Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5718503Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5719990Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5721393Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5722873Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5724327Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5725891Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5727250Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5728614Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5729974Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5731639Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5733193Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5747895Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5749734Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5751200Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5752809Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5754214Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5755620Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5757169Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5758670Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5760353Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5762033Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5763497Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5764917Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5766317Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5767722Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5769291Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5770647Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5772295Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5773694Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5775154Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5776657Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5778387Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5779913Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5781538Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5782975Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5784521Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5786018Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5787511Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5788912Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5790354Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5791752Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5793189Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5794621Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5796154Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5797563Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5798993Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5800445Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5801897Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5803291Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5804694Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5806196Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5807546Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5808944Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5810399Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5812279Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5813734Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5815116Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5816520Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5817920Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5819344Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5820870Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5822398Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5824035Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5825515Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5826928Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5828329Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5829875Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5831222Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5832571Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5833949Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5835355Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5836741Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5838182Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5839635Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5841093Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5842641Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5844102Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5845495Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5846953Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5848326Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5850268Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5851838Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5853419Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5854923Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5856434Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5857974Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5859427Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5860833Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5862223Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5863742Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5865245Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5866682Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5868139Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5869606Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5871040Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5872435Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5873836Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5875226Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5876664Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5878103Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5879572Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5881039Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5882459Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5883867Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5885302Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5886654Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5888029Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5889428Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5890854Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5892475Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5893954Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5895457Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5897011Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5898519Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5899988Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5901443Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5902885Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5904389Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5905783Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5907134Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5908501Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5909900Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5911327Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5912771Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5914167Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5915598Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5916995Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5918341Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5920226Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5921624Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5923051Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5924499Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5925907Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5927327Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5928726Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5930069Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5931695Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5933126Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5934585Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5936063Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5937545Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5938923Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5940307Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5941695Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5943114Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5944667Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5946100Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5947573Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5949168Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5950811Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T10:19:32.5952012Z #34 11802.5 adding 'vllm/model_executor/layers/quantization/utils/configs/README.md' 2025-09-07T10:19:32.5952714Z #34 11802.5 adding 'vllm/model_executor/layers/rotary_embedding/__init__.py' 2025-09-07T10:19:32.5953334Z #34 11802.5 adding 'vllm/model_executor/layers/rotary_embedding/base.py' 2025-09-07T10:19:32.5953956Z #34 11802.5 adding 'vllm/model_executor/layers/rotary_embedding/common.py' 2025-09-07T10:19:32.5954649Z #34 11802.5 adding 'vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py' 2025-09-07T10:19:32.5955399Z #34 11802.5 adding 'vllm/model_executor/layers/rotary_embedding/dual_chunk_rope.py' 2025-09-07T10:19:32.5956146Z #34 11802.5 adding 'vllm/model_executor/layers/rotary_embedding/dynamic_ntk_alpha_rope.py' 2025-09-07T10:19:32.5956929Z #34 11802.5 adding 'vllm/model_executor/layers/rotary_embedding/dynamic_ntk_scaling_rope.py' 2025-09-07T10:19:32.5957735Z #34 11802.5 adding 'vllm/model_executor/layers/rotary_embedding/ernie45_vl_rope.py' 2025-09-07T10:19:32.5958454Z #34 11802.6 adding 'vllm/model_executor/layers/rotary_embedding/linear_scaling_rope.py' 2025-09-07T10:19:32.5959172Z #34 11802.6 adding 'vllm/model_executor/layers/rotary_embedding/llama3_rope.py' 2025-09-07T10:19:32.5959864Z #34 11802.6 adding 'vllm/model_executor/layers/rotary_embedding/llama4_vision_rope.py' 2025-09-07T10:19:32.5960544Z #34 11802.6 adding 'vllm/model_executor/layers/rotary_embedding/mrope.py' 2025-09-07T10:19:32.5961325Z #34 11802.6 adding 'vllm/model_executor/layers/rotary_embedding/ntk_scaling_rope.py' 2025-09-07T10:19:32.5962064Z #34 11802.6 adding 'vllm/model_executor/layers/rotary_embedding/phi3_long_rope_scaled_rope.py' 2025-09-07T10:19:32.5962817Z #34 11802.6 adding 'vllm/model_executor/layers/rotary_embedding/yarn_scaling_rope.py' 2025-09-07T10:19:32.5963487Z #34 11802.6 adding 'vllm/model_executor/layers/shared_fused_moe/__init__.py' 2025-09-07T10:19:32.5964137Z #34 11802.6 adding 'vllm/model_executor/layers/shared_fused_moe/shared_fused_moe.py' 2025-09-07T10:19:32.5964749Z #34 11802.6 adding 'vllm/model_executor/model_loader/__init__.py' 2025-09-07T10:19:32.5965324Z #34 11802.6 adding 'vllm/model_executor/model_loader/base_loader.py' 2025-09-07T10:19:32.5965917Z #34 11802.6 adding 'vllm/model_executor/model_loader/bitsandbytes_loader.py' 2025-09-07T10:19:32.5966514Z #34 11802.6 adding 'vllm/model_executor/model_loader/default_loader.py' 2025-09-07T10:19:32.5967086Z #34 11802.6 adding 'vllm/model_executor/model_loader/dummy_loader.py' 2025-09-07T10:19:32.5967633Z #34 11802.6 adding 'vllm/model_executor/model_loader/gguf_loader.py' 2025-09-07T10:19:32.5968229Z #34 11802.6 adding 'vllm/model_executor/model_loader/runai_streamer_loader.py' 2025-09-07T10:19:32.5968861Z #34 11802.6 adding 'vllm/model_executor/model_loader/sharded_state_loader.py' 2025-09-07T10:19:32.5969444Z #34 11802.6 adding 'vllm/model_executor/model_loader/tensorizer.py' 2025-09-07T10:19:32.5970018Z #34 11802.6 adding 'vllm/model_executor/model_loader/tensorizer_loader.py' 2025-09-07T10:19:32.5970561Z #34 11802.6 adding 'vllm/model_executor/model_loader/tpu.py' 2025-09-07T10:19:32.5971289Z #34 11802.6 adding 'vllm/model_executor/model_loader/utils.py' 2025-09-07T10:19:32.5971828Z #34 11802.6 adding 'vllm/model_executor/model_loader/weight_utils.py' 2025-09-07T10:19:32.5972381Z #34 11802.6 adding 'vllm/model_executor/models/__init__.py' 2025-09-07T10:19:32.5972878Z #34 11802.6 adding 'vllm/model_executor/models/adapters.py' 2025-09-07T10:19:32.5973350Z #34 11802.6 adding 'vllm/model_executor/models/aimv2.py' 2025-09-07T10:19:32.5973880Z #34 11802.6 adding 'vllm/model_executor/models/apertus.py' 2025-09-07T10:19:32.5974346Z #34 11802.6 adding 'vllm/model_executor/models/arcee.py' 2025-09-07T10:19:32.5974818Z #34 11802.6 adding 'vllm/model_executor/models/arctic.py' 2025-09-07T10:19:32.5975285Z #34 11802.6 adding 'vllm/model_executor/models/aria.py' 2025-09-07T10:19:32.5975752Z #34 11802.6 adding 'vllm/model_executor/models/aya_vision.py' 2025-09-07T10:19:32.5976309Z #34 11802.6 adding 'vllm/model_executor/models/baichuan.py' 2025-09-07T10:19:32.5976810Z #34 11802.6 adding 'vllm/model_executor/models/bailing_moe.py' 2025-09-07T10:19:32.5977289Z #34 11802.6 adding 'vllm/model_executor/models/bamba.py' 2025-09-07T10:19:32.5977747Z #34 11802.6 adding 'vllm/model_executor/models/bart.py' 2025-09-07T10:19:32.5978187Z #34 11802.6 adding 'vllm/model_executor/models/bert.py' 2025-09-07T10:19:32.5978679Z #34 11802.6 adding 'vllm/model_executor/models/bert_with_rope.py' 2025-09-07T10:19:32.5979169Z #34 11802.6 adding 'vllm/model_executor/models/blip.py' 2025-09-07T10:19:32.5979628Z #34 11802.6 adding 'vllm/model_executor/models/blip2.py' 2025-09-07T10:19:32.5980092Z #34 11802.6 adding 'vllm/model_executor/models/bloom.py' 2025-09-07T10:19:32.5980558Z #34 11802.6 adding 'vllm/model_executor/models/chameleon.py' 2025-09-07T10:19:32.5981047Z #34 11802.6 adding 'vllm/model_executor/models/chatglm.py' 2025-09-07T10:19:32.5981533Z #34 11802.6 adding 'vllm/model_executor/models/clip.py' 2025-09-07T10:19:32.5982023Z #34 11802.6 adding 'vllm/model_executor/models/cohere2_vision.py' 2025-09-07T10:19:32.5982527Z #34 11802.6 adding 'vllm/model_executor/models/commandr.py' 2025-09-07T10:19:32.5983010Z #34 11802.6 adding 'vllm/model_executor/models/config.py' 2025-09-07T10:19:32.5983629Z #34 11802.6 adding 'vllm/model_executor/models/constant_size_cache.py' 2025-09-07T10:19:32.5984131Z #34 11802.6 adding 'vllm/model_executor/models/dbrx.py' 2025-09-07T10:19:32.5984582Z #34 11802.6 adding 'vllm/model_executor/models/deepseek.py' 2025-09-07T10:19:32.5985075Z #34 11802.6 adding 'vllm/model_executor/models/deepseek_eagle.py' 2025-09-07T10:19:32.5985584Z #34 11802.6 adding 'vllm/model_executor/models/deepseek_mtp.py' 2025-09-07T10:19:32.5986075Z #34 11802.6 adding 'vllm/model_executor/models/deepseek_v2.py' 2025-09-07T10:19:32.5986575Z #34 11802.6 adding 'vllm/model_executor/models/deepseek_vl2.py' 2025-09-07T10:19:32.5987043Z #34 11802.6 adding 'vllm/model_executor/models/donut.py' 2025-09-07T10:19:32.5987495Z #34 11802.6 adding 'vllm/model_executor/models/dots1.py' 2025-09-07T10:19:32.5987948Z #34 11802.6 adding 'vllm/model_executor/models/ernie45.py' 2025-09-07T10:19:32.5988454Z #34 11802.6 adding 'vllm/model_executor/models/ernie45_moe.py' 2025-09-07T10:19:32.5988945Z #34 11802.6 adding 'vllm/model_executor/models/ernie45_vl.py' 2025-09-07T10:19:32.5989432Z #34 11802.6 adding 'vllm/model_executor/models/ernie45_vl_moe.py' 2025-09-07T10:19:32.5989936Z #34 11802.6 adding 'vllm/model_executor/models/ernie_mtp.py' 2025-09-07T10:19:32.5990395Z #34 11802.6 adding 'vllm/model_executor/models/exaone.py' 2025-09-07T10:19:32.5990862Z #34 11802.6 adding 'vllm/model_executor/models/exaone4.py' 2025-09-07T10:19:32.5991352Z #34 11802.6 adding 'vllm/model_executor/models/fairseq2_llama.py' 2025-09-07T10:19:32.5991827Z #34 11802.6 adding 'vllm/model_executor/models/falcon.py' 2025-09-07T10:19:32.5992293Z #34 11802.6 adding 'vllm/model_executor/models/falcon_h1.py' 2025-09-07T10:19:32.5992765Z #34 11802.6 adding 'vllm/model_executor/models/florence2.py' 2025-09-07T10:19:32.5993221Z #34 11802.6 adding 'vllm/model_executor/models/fuyu.py' 2025-09-07T10:19:32.5993650Z #34 11802.6 adding 'vllm/model_executor/models/gemma.py' 2025-09-07T10:19:32.5994102Z #34 11802.6 adding 'vllm/model_executor/models/gemma2.py' 2025-09-07T10:19:32.5994549Z #34 11802.6 adding 'vllm/model_executor/models/gemma3.py' 2025-09-07T10:19:32.5995014Z #34 11802.6 adding 'vllm/model_executor/models/gemma3_mm.py' 2025-09-07T10:19:32.5995490Z #34 11802.6 adding 'vllm/model_executor/models/gemma3n.py' 2025-09-07T10:19:32.5995987Z #34 11802.6 adding 'vllm/model_executor/models/gemma3n_mm.py' 2025-09-07T10:19:32.5996447Z #34 11802.6 adding 'vllm/model_executor/models/glm.py' 2025-09-07T10:19:32.5996867Z #34 11802.6 adding 'vllm/model_executor/models/glm4.py' 2025-09-07T10:19:32.5997309Z #34 11802.6 adding 'vllm/model_executor/models/glm4_1v.py' 2025-09-07T10:19:32.6621559Z #34 11802.6 adding 'vllm/model_executor/models/glm4_moe.py' 2025-09-07T10:19:32.6622296Z #34 11802.6 adding 'vllm/model_executor/models/glm4_moe_mtp.py' 2025-09-07T10:19:32.6622803Z #34 11802.6 adding 'vllm/model_executor/models/glm4v.py' 2025-09-07T10:19:32.6623386Z #34 11802.6 adding 'vllm/model_executor/models/gpt2.py' 2025-09-07T10:19:32.6623856Z #34 11802.6 adding 'vllm/model_executor/models/gpt_bigcode.py' 2025-09-07T10:19:32.6624334Z #34 11802.6 adding 'vllm/model_executor/models/gpt_j.py' 2025-09-07T10:19:32.6624788Z #34 11802.6 adding 'vllm/model_executor/models/gpt_neox.py' 2025-09-07T10:19:32.6625257Z #34 11802.6 adding 'vllm/model_executor/models/gpt_oss.py' 2025-09-07T10:19:32.6625719Z #34 11802.6 adding 'vllm/model_executor/models/granite.py' 2025-09-07T10:19:32.6626211Z #34 11802.6 adding 'vllm/model_executor/models/granite_speech.py' 2025-09-07T10:19:32.6626712Z #34 11802.6 adding 'vllm/model_executor/models/granitemoe.py' 2025-09-07T10:19:32.6627236Z #34 11802.6 adding 'vllm/model_executor/models/granitemoehybrid.py' 2025-09-07T10:19:32.6627858Z #34 11802.6 adding 'vllm/model_executor/models/granitemoeshared.py' 2025-09-07T10:19:32.6628356Z #34 11802.6 adding 'vllm/model_executor/models/gritlm.py' 2025-09-07T10:19:32.6628813Z #34 11802.6 adding 'vllm/model_executor/models/grok1.py' 2025-09-07T10:19:32.6629250Z #34 11802.6 adding 'vllm/model_executor/models/h2ovl.py' 2025-09-07T10:19:32.6629722Z #34 11802.6 adding 'vllm/model_executor/models/hunyuan_v1.py' 2025-09-07T10:19:32.6630241Z #34 11802.6 adding 'vllm/model_executor/models/hyperclovax_vision.py' 2025-09-07T10:19:32.6630817Z #34 11802.6 adding 'vllm/model_executor/models/idefics2_vision_model.py' 2025-09-07T10:19:32.6631356Z #34 11802.6 adding 'vllm/model_executor/models/idefics3.py' 2025-09-07T10:19:32.6631832Z #34 11802.6 adding 'vllm/model_executor/models/interfaces.py' 2025-09-07T10:19:32.6632352Z #34 11802.6 adding 'vllm/model_executor/models/interfaces_base.py' 2025-09-07T10:19:32.6632851Z #34 11802.6 adding 'vllm/model_executor/models/intern_vit.py' 2025-09-07T10:19:32.6633343Z #34 11802.6 adding 'vllm/model_executor/models/internlm2.py' 2025-09-07T10:19:32.6633829Z #34 11802.6 adding 'vllm/model_executor/models/internlm2_ve.py' 2025-09-07T10:19:32.6634320Z #34 11802.6 adding 'vllm/model_executor/models/interns1.py' 2025-09-07T10:19:32.6634859Z #34 11802.6 adding 'vllm/model_executor/models/interns1_vit.py' 2025-09-07T10:19:32.6635337Z #34 11802.6 adding 'vllm/model_executor/models/internvl.py' 2025-09-07T10:19:32.6635798Z #34 11802.6 adding 'vllm/model_executor/models/jais.py' 2025-09-07T10:19:32.6636231Z #34 11802.6 adding 'vllm/model_executor/models/jamba.py' 2025-09-07T10:19:32.6636687Z #34 11802.6 adding 'vllm/model_executor/models/jina_vl.py' 2025-09-07T10:19:32.6637133Z #34 11802.6 adding 'vllm/model_executor/models/keye.py' 2025-09-07T10:19:32.6637592Z #34 11802.6 adding 'vllm/model_executor/models/keye_vl1_5.py' 2025-09-07T10:19:32.6638071Z #34 11802.6 adding 'vllm/model_executor/models/kimi_vl.py' 2025-09-07T10:19:32.6638512Z #34 11802.6 adding 'vllm/model_executor/models/lfm2.py' 2025-09-07T10:19:32.6638957Z #34 11802.6 adding 'vllm/model_executor/models/llama.py' 2025-09-07T10:19:32.6639399Z #34 11802.6 adding 'vllm/model_executor/models/llama4.py' 2025-09-07T10:19:32.6639873Z #34 11802.6 adding 'vllm/model_executor/models/llama4_eagle.py' 2025-09-07T10:19:32.6640358Z #34 11802.6 adding 'vllm/model_executor/models/llama_eagle.py' 2025-09-07T10:19:32.6640853Z #34 11802.6 adding 'vllm/model_executor/models/llama_eagle3.py' 2025-09-07T10:19:32.6641322Z #34 11802.6 adding 'vllm/model_executor/models/llava.py' 2025-09-07T10:19:32.6641786Z #34 11802.6 adding 'vllm/model_executor/models/llava_next.py' 2025-09-07T10:19:32.6642341Z #34 11802.6 adding 'vllm/model_executor/models/llava_next_video.py' 2025-09-07T10:19:32.6642870Z #34 11802.6 adding 'vllm/model_executor/models/llava_onevision.py' 2025-09-07T10:19:32.6643364Z #34 11802.6 adding 'vllm/model_executor/models/mamba.py' 2025-09-07T10:19:32.6643802Z #34 11802.6 adding 'vllm/model_executor/models/mamba2.py' 2025-09-07T10:19:32.6644306Z #34 11802.6 adding 'vllm/model_executor/models/mamba_cache.py' 2025-09-07T10:19:32.6644771Z #34 11802.6 adding 'vllm/model_executor/models/medusa.py' 2025-09-07T10:19:32.6645245Z #34 11802.6 adding 'vllm/model_executor/models/midashenglm.py' 2025-09-07T10:19:32.6645716Z #34 11802.6 adding 'vllm/model_executor/models/mimo.py' 2025-09-07T10:19:32.6646160Z #34 11802.6 adding 'vllm/model_executor/models/mimo_mtp.py' 2025-09-07T10:19:32.6646635Z #34 11802.6 adding 'vllm/model_executor/models/minicpm.py' 2025-09-07T10:19:32.6647100Z #34 11802.6 adding 'vllm/model_executor/models/minicpm3.py' 2025-09-07T10:19:32.6647598Z #34 11802.6 adding 'vllm/model_executor/models/minicpm_eagle.py' 2025-09-07T10:19:32.6648090Z #34 11802.6 adding 'vllm/model_executor/models/minicpmo.py' 2025-09-07T10:19:32.6648570Z #34 11802.6 adding 'vllm/model_executor/models/minicpmv.py' 2025-09-07T10:19:32.6649272Z #34 11802.6 adding 'vllm/model_executor/models/minimax_cache.py' 2025-09-07T10:19:32.6649975Z #34 11802.6 adding 'vllm/model_executor/models/minimax_text_01.py' 2025-09-07T10:19:32.6650577Z #34 11802.6 adding 'vllm/model_executor/models/minimax_vl_01.py' 2025-09-07T10:19:32.6651169Z #34 11802.6 adding 'vllm/model_executor/models/mistral3.py' 2025-09-07T10:19:32.6651658Z #34 11802.6 adding 'vllm/model_executor/models/mixtral.py' 2025-09-07T10:19:32.6652157Z #34 11802.6 adding 'vllm/model_executor/models/mixtral_quant.py' 2025-09-07T10:19:32.6652660Z #34 11802.6 adding 'vllm/model_executor/models/mllama.py' 2025-09-07T10:19:32.6653139Z #34 11802.7 adding 'vllm/model_executor/models/mllama4.py' 2025-09-07T10:19:32.6653648Z #34 11802.7 adding 'vllm/model_executor/models/mlp_speculator.py' 2025-09-07T10:19:32.6654180Z #34 11802.7 adding 'vllm/model_executor/models/modernbert.py' 2025-09-07T10:19:32.6654701Z #34 11802.7 adding 'vllm/model_executor/models/module_mapping.py' 2025-09-07T10:19:32.6655223Z #34 11802.7 adding 'vllm/model_executor/models/molmo.py' 2025-09-07T10:19:32.6655692Z #34 11802.7 adding 'vllm/model_executor/models/moonvit.py' 2025-09-07T10:19:32.6656153Z #34 11802.7 adding 'vllm/model_executor/models/mpt.py' 2025-09-07T10:19:32.6656621Z #34 11802.7 adding 'vllm/model_executor/models/nemotron.py' 2025-09-07T10:19:32.6657157Z #34 11802.7 adding 'vllm/model_executor/models/nemotron_h.py' 2025-09-07T10:19:32.6657669Z #34 11802.7 adding 'vllm/model_executor/models/nemotron_nas.py' 2025-09-07T10:19:32.6658173Z #34 11802.7 adding 'vllm/model_executor/models/nemotron_vl.py' 2025-09-07T10:19:32.6658661Z #34 11802.7 adding 'vllm/model_executor/models/nvlm_d.py' 2025-09-07T10:19:32.6659111Z #34 11802.7 adding 'vllm/model_executor/models/olmo.py' 2025-09-07T10:19:32.6659571Z #34 11802.7 adding 'vllm/model_executor/models/olmo2.py' 2025-09-07T10:19:32.6660036Z #34 11802.7 adding 'vllm/model_executor/models/olmoe.py' 2025-09-07T10:19:32.6660478Z #34 11802.7 adding 'vllm/model_executor/models/opt.py' 2025-09-07T10:19:32.6660931Z #34 11802.7 adding 'vllm/model_executor/models/orion.py' 2025-09-07T10:19:32.6661380Z #34 11802.7 adding 'vllm/model_executor/models/ovis.py' 2025-09-07T10:19:32.6661845Z #34 11802.7 adding 'vllm/model_executor/models/ovis2_5.py' 2025-09-07T10:19:32.6662329Z #34 11802.7 adding 'vllm/model_executor/models/paligemma.py' 2025-09-07T10:19:32.6662948Z #34 11802.7 adding 'vllm/model_executor/models/persimmon.py' 2025-09-07T10:19:32.6663403Z #34 11802.7 adding 'vllm/model_executor/models/phi.py' 2025-09-07T10:19:32.6663832Z #34 11802.7 adding 'vllm/model_executor/models/phi3.py' 2025-09-07T10:19:32.6664274Z #34 11802.7 adding 'vllm/model_executor/models/phi3v.py' 2025-09-07T10:19:32.6664801Z #34 11802.7 adding 'vllm/model_executor/models/phi4_multimodal.py' 2025-09-07T10:19:32.6665313Z #34 11802.7 adding 'vllm/model_executor/models/phi4flash.py' 2025-09-07T10:19:32.6665773Z #34 11802.7 adding 'vllm/model_executor/models/phi4mm.py' 2025-09-07T10:19:32.6666251Z #34 11802.7 adding 'vllm/model_executor/models/phi4mm_audio.py' 2025-09-07T10:19:32.6666748Z #34 11802.7 adding 'vllm/model_executor/models/phi4mm_utils.py' 2025-09-07T10:19:32.6667263Z #34 11802.7 adding 'vllm/model_executor/models/phimoe.py' 2025-09-07T10:19:32.6667724Z #34 11802.7 adding 'vllm/model_executor/models/pixtral.py' 2025-09-07T10:19:32.6668171Z #34 11802.7 adding 'vllm/model_executor/models/plamo2.py' 2025-09-07T10:19:32.6668618Z #34 11802.7 adding 'vllm/model_executor/models/qwen.py' 2025-09-07T10:19:32.6669046Z #34 11802.7 adding 'vllm/model_executor/models/qwen2.py' 2025-09-07T10:19:32.6669550Z #34 11802.7 adding 'vllm/model_executor/models/qwen2_5_omni_thinker.py' 2025-09-07T10:19:32.6670069Z #34 11802.7 adding 'vllm/model_executor/models/qwen2_5_vl.py' 2025-09-07T10:19:32.6670558Z #34 11802.7 adding 'vllm/model_executor/models/qwen2_audio.py' 2025-09-07T10:19:32.6671045Z #34 11802.7 adding 'vllm/model_executor/models/qwen2_moe.py' 2025-09-07T10:19:32.6671510Z #34 11802.7 adding 'vllm/model_executor/models/qwen2_rm.py' 2025-09-07T10:19:32.6671982Z #34 11802.7 adding 'vllm/model_executor/models/qwen2_vl.py' 2025-09-07T10:19:32.6672468Z #34 11802.7 adding 'vllm/model_executor/models/qwen3.py' 2025-09-07T10:19:32.6672926Z #34 11802.7 adding 'vllm/model_executor/models/qwen3_moe.py' 2025-09-07T10:19:32.6673386Z #34 11802.7 adding 'vllm/model_executor/models/qwen_vl.py' 2025-09-07T10:19:32.6673854Z #34 11802.7 adding 'vllm/model_executor/models/registry.py' 2025-09-07T10:19:32.6674326Z #34 11802.7 adding 'vllm/model_executor/models/roberta.py' 2025-09-07T10:19:32.6674763Z #34 11802.7 adding 'vllm/model_executor/models/rvl.py' 2025-09-07T10:19:32.6675208Z #34 11802.7 adding 'vllm/model_executor/models/seed_oss.py' 2025-09-07T10:19:32.6675663Z #34 11802.7 adding 'vllm/model_executor/models/siglip.py' 2025-09-07T10:19:32.6676141Z #34 11802.7 adding 'vllm/model_executor/models/siglip2navit.py' 2025-09-07T10:19:32.6676633Z #34 11802.7 adding 'vllm/model_executor/models/skyworkr1v.py' 2025-09-07T10:19:32.6677112Z #34 11802.7 adding 'vllm/model_executor/models/smolvlm.py' 2025-09-07T10:19:32.6677565Z #34 11802.7 adding 'vllm/model_executor/models/solar.py' 2025-09-07T10:19:32.6678015Z #34 11802.7 adding 'vllm/model_executor/models/stablelm.py' 2025-09-07T10:19:32.6678500Z #34 11802.7 adding 'vllm/model_executor/models/starcoder2.py' 2025-09-07T10:19:32.6679011Z #34 11802.7 adding 'vllm/model_executor/models/step3_text.py' 2025-09-07T10:19:32.6679494Z #34 11802.7 adding 'vllm/model_executor/models/step3_vl.py' 2025-09-07T10:19:32.6679945Z #34 11802.7 adding 'vllm/model_executor/models/swin.py' 2025-09-07T10:19:32.6680397Z #34 11802.7 adding 'vllm/model_executor/models/tarsier.py' 2025-09-07T10:19:32.7625060Z #34 11802.7 adding 'vllm/model_executor/models/telechat2.py' 2025-09-07T10:19:32.7625591Z #34 11802.7 adding 'vllm/model_executor/models/teleflm.py' 2025-09-07T10:19:32.7626089Z #34 11802.7 adding 'vllm/model_executor/models/terratorch.py' 2025-09-07T10:19:32.7626633Z #34 11802.7 adding 'vllm/model_executor/models/transformers.py' 2025-09-07T10:19:32.7627297Z #34 11802.7 adding 'vllm/model_executor/models/ultravox.py' 2025-09-07T10:19:32.7627863Z #34 11802.7 adding 'vllm/model_executor/models/utils.py' 2025-09-07T10:19:32.7628326Z #34 11802.7 adding 'vllm/model_executor/models/vision.py' 2025-09-07T10:19:32.7628788Z #34 11802.7 adding 'vllm/model_executor/models/voxtral.py' 2025-09-07T10:19:32.7629250Z #34 11802.7 adding 'vllm/model_executor/models/whisper.py' 2025-09-07T10:19:32.7629712Z #34 11802.7 adding 'vllm/model_executor/models/zamba2.py' 2025-09-07T10:19:32.7630171Z #34 11802.7 adding 'vllm/model_executor/warmup/__init__.py' 2025-09-07T10:19:32.7630682Z #34 11802.7 adding 'vllm/model_executor/warmup/deep_gemm_warmup.py' 2025-09-07T10:19:32.7631521Z #34 11802.7 adding 'vllm/model_executor/warmup/kernel_warmup.py' 2025-09-07T10:19:32.7631978Z #34 11802.7 adding 'vllm/multimodal/__init__.py' 2025-09-07T10:19:32.7632369Z #34 11802.7 adding 'vllm/multimodal/audio.py' 2025-09-07T10:19:32.7632737Z #34 11802.7 adding 'vllm/multimodal/base.py' 2025-09-07T10:19:32.7633169Z #34 11802.7 adding 'vllm/multimodal/cache.py' 2025-09-07T10:19:32.7633541Z #34 11802.7 adding 'vllm/multimodal/hasher.py' 2025-09-07T10:19:32.7633986Z #34 11802.7 adding 'vllm/multimodal/image.py' 2025-09-07T10:19:32.7634357Z #34 11802.7 adding 'vllm/multimodal/inputs.py' 2025-09-07T10:19:32.7634731Z #34 11802.7 adding 'vllm/multimodal/parse.py' 2025-09-07T10:19:32.7635122Z #34 11802.7 adding 'vllm/multimodal/processing.py' 2025-09-07T10:19:32.7635721Z #34 11802.7 adding 'vllm/multimodal/profiling.py' 2025-09-07T10:19:32.7636194Z #34 11802.7 adding 'vllm/multimodal/registry.py' 2025-09-07T10:19:32.7636576Z #34 11802.7 adding 'vllm/multimodal/utils.py' 2025-09-07T10:19:32.7636967Z #34 11802.7 adding 'vllm/multimodal/video.py' 2025-09-07T10:19:32.7637480Z #34 11802.7 adding 'vllm/platforms/__init__.py' 2025-09-07T10:19:32.7637874Z #34 11802.7 adding 'vllm/platforms/cpu.py' 2025-09-07T10:19:32.7638232Z #34 11802.7 adding 'vllm/platforms/cuda.py' 2025-09-07T10:19:32.7638655Z #34 11802.7 adding 'vllm/platforms/interface.py' 2025-09-07T10:19:32.7639075Z #34 11802.7 adding 'vllm/platforms/rocm.py' 2025-09-07T10:19:32.7639549Z #34 11802.7 adding 'vllm/platforms/tpu.py' 2025-09-07T10:19:32.7639906Z #34 11802.7 adding 'vllm/platforms/xpu.py' 2025-09-07T10:19:32.7640281Z #34 11802.7 adding 'vllm/plugins/__init__.py' 2025-09-07T10:19:32.7640711Z #34 11802.7 adding 'vllm/plugins/io_processors/__init__.py' 2025-09-07T10:19:32.7641209Z #34 11802.7 adding 'vllm/plugins/io_processors/interface.py' 2025-09-07T10:19:32.7641681Z #34 11802.7 adding 'vllm/plugins/lora_resolvers/README.md' 2025-09-07T10:19:32.7642164Z #34 11802.7 adding 'vllm/plugins/lora_resolvers/__init__.py' 2025-09-07T10:19:32.7642698Z #34 11802.7 adding 'vllm/plugins/lora_resolvers/filesystem_resolver.py' 2025-09-07T10:19:32.7643201Z #34 11802.7 adding 'vllm/profiler/__init__.py' 2025-09-07T10:19:32.7643628Z #34 11802.7 adding 'vllm/profiler/layerwise_profile.py' 2025-09-07T10:19:32.7644035Z #34 11802.7 adding 'vllm/profiler/utils.py' 2025-09-07T10:19:32.7644404Z #34 11802.7 adding 'vllm/ray/__init__.py' 2025-09-07T10:19:32.7644755Z #34 11802.7 adding 'vllm/ray/lazy_utils.py' 2025-09-07T10:19:32.7645126Z #34 11802.7 adding 'vllm/ray/ray_env.py' 2025-09-07T10:19:32.7645488Z #34 11802.7 adding 'vllm/reasoning/__init__.py' 2025-09-07T10:19:32.7645936Z #34 11802.7 adding 'vllm/reasoning/abs_reasoning_parsers.py' 2025-09-07T10:19:32.7646578Z #34 11802.7 adding 'vllm/reasoning/deepseek_r1_reasoning_parser.py' 2025-09-07T10:19:32.7647116Z #34 11802.7 adding 'vllm/reasoning/glm4_moe_reasoning_parser.py' 2025-09-07T10:19:32.7647636Z #34 11802.7 adding 'vllm/reasoning/gptoss_reasoning_parser.py' 2025-09-07T10:19:32.7648136Z #34 11802.7 adding 'vllm/reasoning/granite_reasoning_parser.py' 2025-09-07T10:19:32.7648676Z #34 11802.7 adding 'vllm/reasoning/hunyuan_a13b_reasoning_parser.py' 2025-09-07T10:19:32.7649395Z #34 11802.7 adding 'vllm/reasoning/mistral_reasoning_parser.py' 2025-09-07T10:19:32.7650087Z #34 11802.7 adding 'vllm/reasoning/qwen3_reasoning_parser.py' 2025-09-07T10:19:32.7650592Z #34 11802.7 adding 'vllm/reasoning/step3_reasoning_parser.py' 2025-09-07T10:19:32.7651144Z #34 11802.7 adding 'vllm/third_party/__init__.py' 2025-09-07T10:19:32.7651606Z #34 11802.7 adding 'vllm/third_party/pynvml.py' 2025-09-07T10:19:32.7652112Z #34 11802.7 adding 'vllm/transformers_utils/__init__.py' 2025-09-07T10:19:32.7652583Z #34 11802.7 adding 'vllm/transformers_utils/config.py' 2025-09-07T10:19:32.7653054Z #34 11802.7 adding 'vllm/transformers_utils/detokenizer.py' 2025-09-07T10:19:32.7653587Z #34 11802.7 adding 'vllm/transformers_utils/detokenizer_utils.py' 2025-09-07T10:19:32.7654110Z #34 11802.7 adding 'vllm/transformers_utils/dynamic_module.py' 2025-09-07T10:19:32.7654690Z #34 11802.7 adding 'vllm/transformers_utils/processor.py' 2025-09-07T10:19:32.7655179Z #34 11802.7 adding 'vllm/transformers_utils/s3_utils.py' 2025-09-07T10:19:32.7655641Z #34 11802.7 adding 'vllm/transformers_utils/tokenizer.py' 2025-09-07T10:19:32.7656138Z #34 11802.7 adding 'vllm/transformers_utils/tokenizer_base.py' 2025-09-07T10:19:32.7656652Z #34 11802.7 adding 'vllm/transformers_utils/tokenizer_group.py' 2025-09-07T10:19:32.7657277Z #34 11802.7 adding 'vllm/transformers_utils/utils.py' 2025-09-07T10:19:32.7657806Z #34 11802.7 adding 'vllm/transformers_utils/chat_templates/__init__.py' 2025-09-07T10:19:32.7658417Z #34 11802.7 adding 'vllm/transformers_utils/chat_templates/registry.py' 2025-09-07T10:19:32.7659072Z #34 11802.7 adding 'vllm/transformers_utils/chat_templates/template_basic.jinja' 2025-09-07T10:19:32.7659758Z #34 11802.7 adding 'vllm/transformers_utils/chat_templates/template_blip2.jinja' 2025-09-07T10:19:32.7660461Z #34 11802.7 adding 'vllm/transformers_utils/chat_templates/template_chatml.jinja' 2025-09-07T10:19:32.7661181Z #34 11802.7 adding 'vllm/transformers_utils/chat_templates/template_deepseek_vl2.jinja' 2025-09-07T10:19:32.7662009Z #34 11802.7 adding 'vllm/transformers_utils/chat_templates/template_fuyu.jinja' 2025-09-07T10:19:32.7662720Z #34 11802.7 adding 'vllm/transformers_utils/chat_templates/template_minicpmv45.jinja' 2025-09-07T10:19:32.7663465Z #34 11802.7 adding 'vllm/transformers_utils/configs/__init__.py' 2025-09-07T10:19:32.7664042Z #34 11802.7 adding 'vllm/transformers_utils/configs/arctic.py' 2025-09-07T10:19:32.7664546Z #34 11802.7 adding 'vllm/transformers_utils/configs/chatglm.py' 2025-09-07T10:19:32.7665187Z #34 11802.7 adding 'vllm/transformers_utils/configs/deepseek_vl2.py' 2025-09-07T10:19:32.7665716Z #34 11802.7 adding 'vllm/transformers_utils/configs/eagle.py' 2025-09-07T10:19:32.7666227Z #34 11802.7 adding 'vllm/transformers_utils/configs/falcon.py' 2025-09-07T10:19:32.7666718Z #34 11802.7 adding 'vllm/transformers_utils/configs/jais.py' 2025-09-07T10:19:32.7667231Z #34 11802.7 adding 'vllm/transformers_utils/configs/kimi_vl.py' 2025-09-07T10:19:32.7667745Z #34 11802.7 adding 'vllm/transformers_utils/configs/medusa.py' 2025-09-07T10:19:32.7668338Z #34 11802.7 adding 'vllm/transformers_utils/configs/midashenglm.py' 2025-09-07T10:19:32.7668863Z #34 11802.7 adding 'vllm/transformers_utils/configs/mistral.py' 2025-09-07T10:19:32.7669395Z #34 11802.7 adding 'vllm/transformers_utils/configs/mlp_speculator.py' 2025-09-07T10:19:32.7669992Z #34 11802.7 adding 'vllm/transformers_utils/configs/moonvit.py' 2025-09-07T10:19:32.7670500Z #34 11802.7 adding 'vllm/transformers_utils/configs/nemotron.py' 2025-09-07T10:19:32.7671075Z #34 11802.7 adding 'vllm/transformers_utils/configs/nemotron_h.py' 2025-09-07T10:19:32.7671622Z #34 11802.7 adding 'vllm/transformers_utils/configs/nemotron_vl.py' 2025-09-07T10:19:32.7672192Z #34 11802.7 adding 'vllm/transformers_utils/configs/ovis.py' 2025-09-07T10:19:32.7672688Z #34 11802.7 adding 'vllm/transformers_utils/configs/step3_vl.py' 2025-09-07T10:19:32.7673206Z #34 11802.7 adding 'vllm/transformers_utils/configs/ultravox.py' 2025-09-07T10:19:32.7673849Z #34 11802.7 adding 'vllm/transformers_utils/configs/speculators/__init__.py' 2025-09-07T10:19:32.7674579Z #34 11802.7 adding 'vllm/transformers_utils/configs/speculators/algos.py' 2025-09-07T10:19:32.7675184Z #34 11802.7 adding 'vllm/transformers_utils/configs/speculators/base.py' 2025-09-07T10:19:32.7675980Z #34 11802.7 adding 'vllm/transformers_utils/processors/__init__.py' 2025-09-07T10:19:32.7676584Z #34 11802.7 adding 'vllm/transformers_utils/processors/deepseek_vl2.py' 2025-09-07T10:19:32.7677152Z #34 11802.7 adding 'vllm/transformers_utils/processors/ovis.py' 2025-09-07T10:19:32.7677671Z #34 11802.7 adding 'vllm/transformers_utils/processors/ovis2_5.py' 2025-09-07T10:19:32.7678223Z #34 11802.8 adding 'vllm/transformers_utils/tokenizers/__init__.py' 2025-09-07T10:19:32.7678757Z #34 11802.8 adding 'vllm/transformers_utils/tokenizers/mistral.py' 2025-09-07T10:19:32.7679239Z #34 11802.8 adding 'vllm/triton_utils/__init__.py' 2025-09-07T10:19:32.7679710Z #34 11802.8 adding 'vllm/triton_utils/importing.py' 2025-09-07T10:19:32.7680101Z #34 11802.8 adding 'vllm/usage/__init__.py' 2025-09-07T10:19:32.7680626Z #34 11802.8 adding 'vllm/usage/usage_lib.py' 2025-09-07T10:19:32.7680981Z #34 11802.8 adding 'vllm/utils/__init__.py' 2025-09-07T10:19:32.7681345Z #34 11802.8 adding 'vllm/utils/deep_gemm.py' 2025-09-07T10:19:32.7681764Z #34 11802.8 adding 'vllm/utils/flashinfer.py' 2025-09-07T10:19:32.7682136Z #34 11802.8 adding 'vllm/utils/jsontree.py' 2025-09-07T10:19:32.7682546Z #34 11802.8 adding 'vllm/utils/tensor_schema.py' 2025-09-07T10:19:32.7682949Z #34 11802.8 adding 'vllm/v1/__init__.py' 2025-09-07T10:19:32.7683340Z #34 11802.8 adding 'vllm/v1/cudagraph_dispatcher.py' 2025-09-07T10:19:32.7683753Z #34 11802.8 adding 'vllm/v1/kv_cache_interface.py' 2025-09-07T10:19:32.7684138Z #34 11802.8 adding 'vllm/v1/outputs.py' 2025-09-07T10:19:32.7684474Z #34 11802.8 adding 'vllm/v1/request.py' 2025-09-07T10:19:32.7684828Z #34 11802.8 adding 'vllm/v1/serial_utils.py' 2025-09-07T10:19:32.7685183Z #34 11802.8 adding 'vllm/v1/utils.py' 2025-09-07T10:19:32.7685590Z #34 11802.8 adding 'vllm/v1/attention/__init__.py' 2025-09-07T10:19:32.7686023Z #34 11802.8 adding 'vllm/v1/attention/backends/__init__.py' 2025-09-07T10:19:32.7686506Z #34 11802.8 adding 'vllm/v1/attention/backends/cpu_attn.py' 2025-09-07T10:19:32.7687028Z #34 11802.8 adding 'vllm/v1/attention/backends/flash_attn.py' 2025-09-07T10:19:32.7687515Z #34 11802.8 adding 'vllm/v1/attention/backends/flashinfer.py' 2025-09-07T10:19:32.7688025Z #34 11802.8 adding 'vllm/v1/attention/backends/flex_attention.py' 2025-09-07T10:19:32.7688599Z #34 11802.8 adding 'vllm/v1/attention/backends/linear_attn.py' 2025-09-07T10:19:32.7689098Z #34 11802.8 adding 'vllm/v1/attention/backends/mamba1_attn.py' 2025-09-07T10:19:32.7689579Z #34 11802.8 adding 'vllm/v1/attention/backends/mamba2_attn.py' 2025-09-07T10:19:32.7690072Z #34 11802.8 adding 'vllm/v1/attention/backends/mamba_attn.py' 2025-09-07T10:19:32.7690654Z #34 11802.8 adding 'vllm/v1/attention/backends/pallas.py' 2025-09-07T10:19:32.7691398Z #34 11802.8 adding 'vllm/v1/attention/backends/rocm_aiter_fa.py' 2025-09-07T10:19:32.7691946Z #34 11802.8 adding 'vllm/v1/attention/backends/short_conv_attn.py' 2025-09-07T10:19:32.7692463Z #34 11802.8 adding 'vllm/v1/attention/backends/tree_attn.py' 2025-09-07T10:19:32.7692969Z #34 11802.8 adding 'vllm/v1/attention/backends/triton_attn.py' 2025-09-07T10:19:32.7693453Z #34 11802.8 adding 'vllm/v1/attention/backends/utils.py' 2025-09-07T10:19:32.7693928Z #34 11802.8 adding 'vllm/v1/attention/backends/xformers.py' 2025-09-07T10:19:32.7694492Z #34 11802.8 adding 'vllm/v1/attention/backends/mla/__init__.py' 2025-09-07T10:19:32.7694991Z #34 11802.8 adding 'vllm/v1/attention/backends/mla/common.py' 2025-09-07T10:19:32.7695528Z #34 11802.8 adding 'vllm/v1/attention/backends/mla/cutlass_mla.py' 2025-09-07T10:19:32.7696090Z #34 11802.8 adding 'vllm/v1/attention/backends/mla/flashattn_mla.py' 2025-09-07T10:19:32.7696644Z #34 11802.8 adding 'vllm/v1/attention/backends/mla/flashmla.py' 2025-09-07T10:19:32.7697188Z #34 11802.8 adding 'vllm/v1/attention/backends/mla/rocm_aiter_mla.py' 2025-09-07T10:19:32.7697747Z #34 11802.8 adding 'vllm/v1/attention/backends/mla/triton_mla.py' 2025-09-07T10:19:32.7698221Z #34 11802.8 adding 'vllm/v1/core/__init__.py' 2025-09-07T10:19:32.7698612Z #34 11802.8 adding 'vllm/v1/core/block_pool.py' 2025-09-07T10:19:32.7699061Z #34 11802.8 adding 'vllm/v1/core/encoder_cache_manager.py' 2025-09-07T10:19:32.7699530Z #34 11802.8 adding 'vllm/v1/core/kv_cache_coordinator.py' 2025-09-07T10:19:32.7699985Z #34 11802.8 adding 'vllm/v1/core/kv_cache_manager.py' 2025-09-07T10:19:32.7700407Z #34 11802.8 adding 'vllm/v1/core/kv_cache_utils.py' 2025-09-07T10:19:32.7700896Z #34 11802.8 adding 'vllm/v1/core/single_type_kv_cache_manager.py' 2025-09-07T10:19:32.7701471Z #34 11802.8 adding 'vllm/v1/core/sched/__init__.py' 2025-09-07T10:19:32.7701933Z #34 11802.8 adding 'vllm/v1/core/sched/async_scheduler.py' 2025-09-07T10:19:32.7702438Z #34 11802.8 adding 'vllm/v1/core/sched/interface.py' 2025-09-07T10:19:32.7702860Z #34 11802.8 adding 'vllm/v1/core/sched/output.py' 2025-09-07T10:19:32.7703414Z #34 11802.8 adding 'vllm/v1/core/sched/request_queue.py' 2025-09-07T10:19:32.7703835Z #34 11802.8 adding 'vllm/v1/core/sched/scheduler.py' 2025-09-07T10:19:32.7704250Z #34 11802.8 adding 'vllm/v1/core/sched/utils.py' 2025-09-07T10:19:32.7704760Z #34 11802.8 adding 'vllm/v1/engine/__init__.py' 2025-09-07T10:19:32.7705150Z #34 11802.8 adding 'vllm/v1/engine/async_llm.py' 2025-09-07T10:19:32.7705543Z #34 11802.8 adding 'vllm/v1/engine/coordinator.py' 2025-09-07T10:19:32.7705937Z #34 11802.8 adding 'vllm/v1/engine/core.py' 2025-09-07T10:19:32.7706323Z #34 11802.8 adding 'vllm/v1/engine/core_client.py' 2025-09-07T10:19:32.7706795Z #34 11802.8 adding 'vllm/v1/engine/detokenizer.py' 2025-09-07T10:19:32.7707206Z #34 11802.8 adding 'vllm/v1/engine/exceptions.py' 2025-09-07T10:19:32.7707600Z #34 11802.8 adding 'vllm/v1/engine/llm_engine.py' 2025-09-07T10:19:32.7708002Z #34 11802.8 adding 'vllm/v1/engine/logprobs.py' 2025-09-07T10:19:32.7708416Z #34 11802.8 adding 'vllm/v1/engine/output_processor.py' 2025-09-07T10:19:32.7708864Z #34 11802.8 adding 'vllm/v1/engine/parallel_sampling.py' 2025-09-07T10:19:32.7709290Z #34 11802.8 adding 'vllm/v1/engine/processor.py' 2025-09-07T10:19:32.7709705Z #34 11802.8 adding 'vllm/v1/engine/utils.py' 2025-09-07T10:19:32.7710146Z #34 11802.8 adding 'vllm/v1/executor/__init__.py' 2025-09-07T10:19:32.7710543Z #34 11802.8 adding 'vllm/v1/executor/abstract.py' 2025-09-07T10:19:32.7710988Z #34 11802.8 adding 'vllm/v1/executor/multiproc_executor.py' 2025-09-07T10:19:32.7711489Z #34 11802.8 adding 'vllm/v1/executor/ray_distributed_executor.py' 2025-09-07T10:19:32.7711952Z #34 11802.8 adding 'vllm/v1/metrics/__init__.py' 2025-09-07T10:19:32.7712332Z #34 11802.8 adding 'vllm/v1/metrics/loggers.py' 2025-09-07T10:19:32.7712736Z #34 11802.8 adding 'vllm/v1/metrics/prometheus.py' 2025-09-07T10:19:32.7713164Z #34 11802.8 adding 'vllm/v1/metrics/ray_wrappers.py' 2025-09-07T10:19:32.9421273Z #34 11802.8 adding 'vllm/v1/metrics/reader.py' 2025-09-07T10:19:32.9421746Z #34 11802.8 adding 'vllm/v1/metrics/stats.py' 2025-09-07T10:19:32.9422183Z #34 11802.8 adding 'vllm/v1/pool/__init__.py' 2025-09-07T10:19:32.9422584Z #34 11802.8 adding 'vllm/v1/pool/metadata.py' 2025-09-07T10:19:32.9422976Z #34 11802.8 adding 'vllm/v1/sample/__init__.py' 2025-09-07T10:19:32.9423528Z #34 11802.8 adding 'vllm/v1/sample/metadata.py' 2025-09-07T10:19:32.9423953Z #34 11802.8 adding 'vllm/v1/sample/rejection_sampler.py' 2025-09-07T10:19:32.9424392Z #34 11802.8 adding 'vllm/v1/sample/sampler.py' 2025-09-07T10:19:32.9425038Z #34 11802.8 adding 'vllm/v1/sample/logits_processor/__init__.py' 2025-09-07T10:19:32.9425589Z #34 11802.8 adding 'vllm/v1/sample/logits_processor/builtin.py' 2025-09-07T10:19:32.9426121Z #34 11802.8 adding 'vllm/v1/sample/logits_processor/interface.py' 2025-09-07T10:19:32.9426632Z #34 11802.8 adding 'vllm/v1/sample/logits_processor/state.py' 2025-09-07T10:19:32.9427108Z #34 11802.8 adding 'vllm/v1/sample/ops/__init__.py' 2025-09-07T10:19:32.9427528Z #34 11802.8 adding 'vllm/v1/sample/ops/bad_words.py' 2025-09-07T10:19:32.9427957Z #34 11802.8 adding 'vllm/v1/sample/ops/logprobs.py' 2025-09-07T10:19:32.9428379Z #34 11802.8 adding 'vllm/v1/sample/ops/penalties.py' 2025-09-07T10:19:32.9428846Z #34 11802.8 adding 'vllm/v1/sample/ops/topk_topp_sampler.py' 2025-09-07T10:19:32.9429316Z #34 11802.8 adding 'vllm/v1/sample/tpu/__init__.py' 2025-09-07T10:19:32.9429730Z #34 11802.8 adding 'vllm/v1/sample/tpu/metadata.py' 2025-09-07T10:19:32.9430153Z #34 11802.8 adding 'vllm/v1/sample/tpu/sampler.py' 2025-09-07T10:19:32.9430570Z #34 11802.8 adding 'vllm/v1/spec_decode/__init__.py' 2025-09-07T10:19:32.9430990Z #34 11802.8 adding 'vllm/v1/spec_decode/eagle.py' 2025-09-07T10:19:32.9431393Z #34 11802.8 adding 'vllm/v1/spec_decode/medusa.py' 2025-09-07T10:19:32.9431819Z #34 11802.8 adding 'vllm/v1/spec_decode/metadata.py' 2025-09-07T10:19:32.9432321Z #34 11802.8 adding 'vllm/v1/spec_decode/metrics.py' 2025-09-07T10:19:32.9432777Z #34 11802.8 adding 'vllm/v1/spec_decode/ngram_proposer.py' 2025-09-07T10:19:32.9433226Z #34 11802.8 adding 'vllm/v1/spec_decode/utils.py' 2025-09-07T10:19:32.9433765Z #34 11802.8 adding 'vllm/v1/structured_output/__init__.py' 2025-09-07T10:19:32.9434434Z #34 11802.8 adding 'vllm/v1/structured_output/backend_guidance.py' 2025-09-07T10:19:32.9435080Z #34 11802.8 adding 'vllm/v1/structured_output/backend_lm_format_enforcer.py' 2025-09-07T10:19:32.9435678Z #34 11802.8 adding 'vllm/v1/structured_output/backend_outlines.py' 2025-09-07T10:19:32.9436201Z #34 11802.8 adding 'vllm/v1/structured_output/backend_types.py' 2025-09-07T10:19:32.9436736Z #34 11802.8 adding 'vllm/v1/structured_output/backend_xgrammar.py' 2025-09-07T10:19:32.9437251Z #34 11802.8 adding 'vllm/v1/structured_output/request.py' 2025-09-07T10:19:32.9437699Z #34 11802.8 adding 'vllm/v1/structured_output/utils.py' 2025-09-07T10:19:32.9438127Z #34 11802.8 adding 'vllm/v1/worker/__init__.py' 2025-09-07T10:19:32.9438525Z #34 11802.8 adding 'vllm/v1/worker/block_table.py' 2025-09-07T10:19:32.9438957Z #34 11802.8 adding 'vllm/v1/worker/cpu_model_runner.py' 2025-09-07T10:19:32.9439376Z #34 11802.8 adding 'vllm/v1/worker/cpu_worker.py' 2025-09-07T10:19:32.9439801Z #34 11802.8 adding 'vllm/v1/worker/gpu_input_batch.py' 2025-09-07T10:19:32.9440231Z #34 11802.8 adding 'vllm/v1/worker/gpu_model_runner.py' 2025-09-07T10:19:32.9440743Z #34 11802.8 adding 'vllm/v1/worker/gpu_worker.py' 2025-09-07T10:19:32.9441236Z #34 11802.8 adding 'vllm/v1/worker/kv_connector_model_runner_mixin.py' 2025-09-07T10:19:32.9441766Z #34 11802.8 adding 'vllm/v1/worker/lora_model_runner_mixin.py' 2025-09-07T10:19:32.9442242Z #34 11802.8 adding 'vllm/v1/worker/tpu_input_batch.py' 2025-09-07T10:19:32.9442673Z #34 11802.8 adding 'vllm/v1/worker/tpu_model_runner.py' 2025-09-07T10:19:32.9443113Z #34 11802.8 adding 'vllm/v1/worker/tpu_worker.py' 2025-09-07T10:19:32.9443501Z #34 11802.8 adding 'vllm/v1/worker/utils.py' 2025-09-07T10:19:32.9443905Z #34 11802.8 adding 'vllm/v1/worker/worker_base.py' 2025-09-07T10:19:32.9444342Z #34 11802.8 adding 'vllm/v1/worker/xpu_model_runner.py' 2025-09-07T10:19:32.9444760Z #34 11802.8 adding 'vllm/v1/worker/xpu_worker.py' 2025-09-07T10:19:32.9445178Z #34 11802.8 adding 'vllm/vllm_flash_attn/.gitkeep' 2025-09-07T10:19:32.9445594Z #34 11802.8 adding 'vllm/vllm_flash_attn/__init__.py' 2025-09-07T10:19:44.4928750Z #34 11814.5 adding 'vllm/vllm_flash_attn/_vllm_fa2_C.abi3.so' 2025-09-07T10:20:18.3196564Z #34 11848.4 adding 'vllm/vllm_flash_attn/_vllm_fa3_C.abi3.so' 2025-09-07T10:20:19.8153048Z #34 11849.9 adding 'vllm/vllm_flash_attn/flash_attn_interface.py' 2025-09-07T10:20:19.9893545Z #34 11849.9 adding 'vllm/vllm_flash_attn/layers/__init__.py' 2025-09-07T10:20:19.9894127Z #34 11849.9 adding 'vllm/vllm_flash_attn/layers/rotary.py' 2025-09-07T10:20:19.9894673Z #34 11849.9 adding 'vllm/vllm_flash_attn/ops/triton/__init__.py' 2025-09-07T10:20:19.9895218Z #34 11849.9 adding 'vllm/vllm_flash_attn/ops/triton/rotary.py' 2025-09-07T10:20:19.9895694Z #34 11849.9 adding 'vllm/worker/__init__.py' 2025-09-07T10:20:19.9896101Z #34 11849.9 adding 'vllm/worker/cache_engine.py' 2025-09-07T10:20:19.9896536Z #34 11849.9 adding 'vllm/worker/enc_dec_model_runner.py' 2025-09-07T10:20:19.9896983Z #34 11849.9 adding 'vllm/worker/model_runner.py' 2025-09-07T10:20:19.9897406Z #34 11849.9 adding 'vllm/worker/model_runner_base.py' 2025-09-07T10:20:19.9897840Z #34 11849.9 adding 'vllm/worker/utils.py' 2025-09-07T10:20:19.9898222Z #34 11849.9 adding 'vllm/worker/worker.py' 2025-09-07T10:20:19.9898605Z #34 11849.9 adding 'vllm/worker/worker_base.py' 2025-09-07T10:20:19.9899231Z #34 11849.9 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129.dist-info/licenses/LICENSE' 2025-09-07T10:20:19.9899999Z #34 11849.9 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129.dist-info/METADATA' 2025-09-07T10:20:19.9900714Z #34 11849.9 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129.dist-info/WHEEL' 2025-09-07T10:20:19.9901669Z #34 11849.9 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129.dist-info/entry_points.txt' 2025-09-07T10:20:19.9902504Z #34 11849.9 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129.dist-info/top_level.txt' 2025-09-07T10:20:19.9903369Z #34 11849.9 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129.dist-info/RECORD' 2025-09-07T10:20:19.9903923Z #34 11849.9 removing build/bdist.linux-x86_64/wheel 2025-09-07T10:20:20.3751120Z #34 11850.4 Compile requests 504 2025-09-07T10:20:20.3751807Z #34 11850.4 Compile requests executed 504 2025-09-07T10:20:20.3752392Z #34 11850.4 Cache hits 72 2025-09-07T10:20:20.3752945Z #34 11850.4 Cache hits (C/C++) 7 2025-09-07T10:20:20.3753482Z #34 11850.4 Cache hits (CUDA) 65 2025-09-07T10:20:20.3754054Z #34 11850.4 Cache misses 432 2025-09-07T10:20:20.3754578Z #34 11850.4 Cache misses (C/C++) 3 2025-09-07T10:20:20.3755121Z #34 11850.4 Cache misses (CUDA) 429 2025-09-07T10:20:20.3755636Z #34 11850.4 Cache timeouts 0 2025-09-07T10:20:20.3756163Z #34 11850.4 Cache read errors 0 2025-09-07T10:20:20.3756679Z #34 11850.4 Forced recaches 0 2025-09-07T10:20:20.3757218Z #34 11850.4 Cache write errors 0 2025-09-07T10:20:20.3757953Z #34 11850.4 Compilation failures 0 2025-09-07T10:20:20.3758370Z #34 11850.4 Cache errors 0 2025-09-07T10:20:20.3758808Z #34 11850.4 Non-cacheable compilations 0 2025-09-07T10:20:20.3759238Z #34 11850.4 Non-cacheable calls 0 2025-09-07T10:20:20.3759673Z #34 11850.4 Non-compilation calls 0 2025-09-07T10:20:20.3760104Z #34 11850.4 Unsupported compiler calls 0 2025-09-07T10:20:20.3760547Z #34 11850.4 Average cache write 0.093 s 2025-09-07T10:20:20.3760985Z #34 11850.4 Average compiler 262.869 s 2025-09-07T10:20:20.3761416Z #34 11850.4 Average cache read hit 0.095 s 2025-09-07T10:20:20.3761864Z #34 11850.4 Failed distributed compilations 0 2025-09-07T10:20:20.3762442Z #34 11850.4 Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-09-07T10:20:20.3763019Z #34 11850.4 Version (client) 0.8.1 2025-09-07T10:20:20.7647404Z #34 DONE 11850.8s 2025-09-07T10:20:20.9175418Z 2025-09-07T10:20:20.9179577Z #35 [build 6/7] RUN --mount=type=cache,target=/root/.cache/ccache --mount=type=cache,target=/root/.cache/uv --mount=type=bind,source=.git,target=.git if [ "1" != "1" ]; then rm -rf .deps && mkdir -p .deps && export VLLM_DOCKER_BUILD_CONTEXT=1 && python3 setup.py bdist_wheel --dist-dir=vllm-dist --py-limited-api=cp38; fi 2025-09-07T10:20:21.3295934Z #35 DONE 0.6s 2025-09-07T10:20:21.4820348Z 2025-09-07T10:20:21.4821412Z #36 [build 7/7] RUN echo "[INFO] Listing current directory:" && ls -al && echo "[INFO] Showing torch_build_versions.txt content:" && cat torch_build_versions.txt 2025-09-07T10:20:22.0956071Z #36 0.764 [INFO] Listing current directory: 2025-09-07T10:20:22.2727566Z #36 0.770 total 356 2025-09-07T10:20:22.2727966Z #36 0.770 drwxr-xr-x. 1 root root 94 Sep 7 10:19 . 2025-09-07T10:20:22.2728511Z #36 0.770 drwxr-xr-x. 1 root root 6 Sep 7 10:20 .. 2025-09-07T10:20:22.2729011Z #36 0.770 drwxr-xr-x. 10 root root 16384 Sep 7 06:21 benchmarks 2025-09-07T10:20:22.2729499Z #36 0.770 drwxr-xr-x. 5 root root 105 Sep 7 10:18 build 2025-09-07T10:20:22.2729980Z #36 0.770 drwxr-xr-x. 5 root root 16384 Sep 7 06:21 .buildkite 2025-09-07T10:20:22.2730513Z #36 0.770 -rw-r--r--. 1 root root 641 Sep 7 06:21 .clang-format 2025-09-07T10:20:22.2731139Z #36 0.770 drwxr-xr-x. 3 root root 94 Sep 7 06:21 cmake 2025-09-07T10:20:22.2731795Z #36 0.770 -rw-r--r--. 1 root root 38227 Sep 7 06:21 CMakeLists.txt 2025-09-07T10:20:22.2732542Z #36 0.770 -rw-r--r--. 1 root root 5318 Sep 7 06:21 CODE_OF_CONDUCT.md 2025-09-07T10:20:22.2733077Z #36 0.770 -rw-r--r--. 1 root root 140 Sep 7 06:21 CONTRIBUTING.md 2025-09-07T10:20:22.2733581Z #36 0.770 drwxr-xr-x. 1 root root 63 Sep 7 06:21 csrc 2025-09-07T10:20:22.2734046Z #36 0.770 -rw-r--r--. 1 root root 1366 Sep 7 06:21 DCO 2025-09-07T10:20:22.2734495Z #36 0.770 drwxr-xr-x. 10 root root 16384 Sep 7 07:03 .deps 2025-09-07T10:20:22.2735065Z #36 0.770 drwxr-xr-x. 2 root root 16384 Sep 7 06:21 docker 2025-09-07T10:20:22.2735546Z #36 0.770 -rw-r--r--. 1 root root 345 Sep 7 06:21 .dockerignore 2025-09-07T10:20:22.2736042Z #36 0.770 drwxr-xr-x. 18 root root 16384 Sep 7 06:21 docs 2025-09-07T10:20:22.2736507Z #36 0.770 drwxr-xr-x. 5 root root 16384 Sep 7 06:21 examples 2025-09-07T10:20:22.2737024Z #36 0.770 -rw-r--r--. 1 root root 944 Sep 7 06:21 find_cuda_init.py 2025-09-07T10:20:22.2737546Z #36 0.770 -rwxr-xr-x. 1 root root 284 Sep 7 06:21 format.sh 2025-09-07T10:20:22.2738024Z #36 0.770 drwxr-xr-x. 2 root root 25 Sep 7 06:21 .gemini 2025-09-07T10:20:22.2738501Z #36 0.770 drwxr-xr-x. 8 root root 181 Sep 7 06:21 .git 2025-09-07T10:20:22.2738961Z #36 0.770 drwxr-xr-x. 5 root root 16384 Sep 7 06:21 .github 2025-09-07T10:20:22.2739441Z #36 0.770 -rw-r--r--. 1 root root 3734 Sep 7 06:21 .gitignore 2025-09-07T10:20:22.2739908Z #36 0.770 -rw-r--r--. 1 root root 11357 Sep 7 06:21 LICENSE 2025-09-07T10:20:22.2740456Z #36 0.770 -rw-r--r--. 1 root root 212 Sep 7 06:21 MANIFEST.in 2025-09-07T10:20:22.2740975Z #36 0.770 -rw-r--r--. 1 root root 165 Sep 7 06:21 .markdownlint.yaml 2025-09-07T10:20:22.2741487Z #36 0.770 -rw-r--r--. 1 root root 4237 Sep 7 06:21 mkdocs.yaml 2025-09-07T10:20:22.2742023Z #36 0.770 -rw-r--r--. 1 root root 6134 Sep 7 06:21 .pre-commit-config.yaml 2025-09-07T10:20:22.2742563Z #36 0.770 -rw-r--r--. 1 root root 8187 Sep 7 07:02 pyproject.toml 2025-09-07T10:20:22.2743233Z #36 0.770 -rw-r--r--. 1 root root 12531 Sep 7 06:21 README.md 2025-09-07T10:20:22.2743819Z #36 0.770 -rw-r--r--. 1 root root 416 Sep 7 06:21 .readthedocs.yaml 2025-09-07T10:20:22.2744300Z #36 0.770 -rw-r--r--. 1 root root 5696 Sep 7 06:21 RELEASE.md 2025-09-07T10:20:22.2744780Z #36 0.770 drwxr-xr-x. 1 root root 159 Sep 7 06:21 requirements 2025-09-07T10:20:22.2745247Z #36 0.770 -rw-r--r--. 1 root root 3657 Sep 7 06:21 SECURITY.md 2025-09-07T10:20:22.2745703Z #36 0.770 -rw-r--r--. 1 root root 24740 Sep 7 06:21 setup.py 2025-09-07T10:20:22.2746153Z #36 0.770 -rw-r--r--. 1 root root 496 Sep 7 06:21 .shellcheckrc 2025-09-07T10:20:22.2746616Z #36 0.770 drwxr-xr-x. 46 root root 16384 Sep 7 06:21 tests 2025-09-07T10:20:22.2747106Z #36 0.770 drwxr-xr-x. 2 root root 16384 Sep 7 06:20 tmp 2025-09-07T10:20:22.2747550Z #36 0.770 drwxr-xr-x. 4 root root 16384 Sep 7 06:21 tools 2025-09-07T10:20:22.2748043Z #36 0.770 -rw-r--r--. 1 root root 290 Sep 7 07:02 torch_build_versions.txt 2025-09-07T10:20:22.2748572Z #36 0.770 -rw-r--r--. 1 root root 654 Sep 7 06:21 use_existing_torch.py 2025-09-07T10:20:22.2749420Z #36 0.770 drwxr-xr-x. 1 root root 67 Sep 7 07:02 vllm 2025-09-07T10:20:22.2750070Z #36 0.770 drwxr-xr-x. 2 root root 89 Sep 7 10:19 vllm-dist 2025-09-07T10:20:22.2750590Z #36 0.770 drwxr-xr-x. 2 root root 134 Sep 7 07:02 vllm.egg-info 2025-09-07T10:20:22.2751101Z #36 0.770 drwxr-xr-x. 2 root root 75 Sep 7 07:00 xformers-dist 2025-09-07T10:20:22.2751619Z #36 0.770 -rw-r--r--. 1 root root 15 Sep 7 06:21 .yapfignore 2025-09-07T10:20:22.2752107Z #36 0.770 [INFO] Showing torch_build_versions.txt content: 2025-09-07T10:20:22.2752746Z #36 0.773 torch @ file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T10:20:22.2753625Z #36 0.773 torchaudio @ file:///dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T10:20:22.2754569Z #36 0.773 torchvision @ file:///dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T10:20:22.2755217Z #36 DONE 0.8s 2025-09-07T10:20:23.8402354Z 2025-09-07T10:20:23.8403184Z #37 [export-wheels 2/4] COPY --from=build /workspace/vllm-dist /wheels/vllm 2025-09-07T10:20:24.0520427Z #37 DONE 0.0s 2025-09-07T10:20:24.0520865Z 2025-09-07T10:20:24.0521525Z #38 [vllm-base 6/18] COPY --from=build /workspace/vllm-dist /wheels/vllm 2025-09-07T10:20:24.0522736Z #38 DONE 0.0s 2025-09-07T10:20:24.0523096Z 2025-09-07T10:20:24.0525043Z #39 [vllm-base 7/18] RUN echo "[INFO] Listing current directory before torch install step:" && ls -al && echo "[INFO] Showing torch_build_versions.txt content:" && cat torch_build_versions.txt 2025-09-07T10:20:24.4471486Z #39 0.546 [INFO] Listing current directory before torch install step: 2025-09-07T10:20:24.6229442Z #39 0.550 total 4 2025-09-07T10:20:24.6230384Z #39 0.550 drwxr-xr-x. 1 root root 38 Sep 7 07:02 . 2025-09-07T10:20:24.6231506Z #39 0.550 drwxr-xr-x. 1 root root 6 Sep 7 10:20 .. 2025-09-07T10:20:24.6232831Z #39 0.550 -rw-r--r--. 1 root root 290 Sep 7 07:02 torch_build_versions.txt 2025-09-07T10:20:24.6234208Z #39 0.550 [INFO] Showing torch_build_versions.txt content: 2025-09-07T10:20:24.6235746Z #39 0.553 torch @ file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T10:20:24.6236593Z #39 0.553 torchaudio @ file:///dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T10:20:24.6237828Z #39 0.553 torchvision @ file:///dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T10:20:24.6238441Z #39 DONE 0.6s 2025-09-07T10:20:24.6238579Z 2025-09-07T10:20:24.6238867Z #40 [vllm-base 8/18] RUN ldconfig /usr/local/cuda-$(echo 12.8.1 | cut -d. -f1,2)/compat/ 2025-09-07T10:20:25.2690641Z #40 DONE 0.8s 2025-09-07T10:20:25.4211599Z 2025-09-07T10:20:25.4213538Z #41 [vllm-base 9/18] RUN --mount=type=cache,target=/root/.cache/uv if ! python3 -m uv --version > /dev/null 2>&1; then python3 -m pip install uv==0.8.4; fi 2025-09-07T10:20:27.0180109Z #41 1.748 Collecting uv==0.8.4 2025-09-07T10:20:27.2143395Z #41 1.781 Downloading uv-0.8.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB) 2025-09-07T10:20:27.2144446Z #41 1.797 Downloading uv-0.8.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.8 MB) 2025-09-07T10:20:27.4294727Z #41 1.944 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.8/18.8 MB 148.6 MB/s 0:00:00 2025-09-07T10:20:27.4295330Z #41 2.009 Installing collected packages: uv 2025-09-07T10:20:27.5658964Z #41 2.296 Successfully installed uv-0.8.4 2025-09-07T10:20:27.6773024Z #41 2.296 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-09-07T10:20:27.6775512Z #41 DONE 2.4s 2025-09-07T10:20:27.8285346Z 2025-09-07T10:20:27.8293697Z #42 [vllm-base 10/18] RUN --mount=type=bind,source=tmp,target=/dist --mount=type=cache,target=/root/.cache/uv if [ -n "tmp" ] && [ "tmp" != "./requirements" ] && [ -d "/dist" ] && ls /dist/torch*.whl >/dev/null 2>&1; then torch_whl=$(find /dist -maxdepth 1 -name 'torch-*.whl' -print -quit); vision_whl=$(find /dist -name 'torchvision*.whl' | head -n1 | xargs); audio_whl=$(find /dist -name 'torchaudio*.whl' | head -n1 | xargs); echo "[INFO] Use wheels to build : '${torch_whl}' '${audio_whl}' '${vision_whl}'"; uv pip install --system "${torch_whl}[opt-einsum]" "${vision_whl}" "${audio_whl}" /dist/*.whl; else echo "[INFO] Installing torch versions from torch_build_versions.txt"; uv pip install --system $(cat torch_build_versions.txt | xargs) --index-url https://download.pytorch.org/whl/nightly/cu$(echo 12.8.1 | cut -d. -f1,2 | tr -d '.'); fi 2025-09-07T10:20:28.6033066Z #42 0.925 [INFO] Use wheels to build : '/dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl' '/dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl' '/dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl' 2025-09-07T10:20:28.8440765Z #42 0.936 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T10:20:28.8441323Z #42 0.975 Resolved 31 packages in 34ms 2025-09-07T10:20:28.8442040Z #42 1.015 Uninstalled 1 package in 39ms 2025-09-07T10:20:30.5571313Z #42 2.879 Installed 31 packages in 1.86s 2025-09-07T10:20:30.7133335Z #42 2.883 + filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T10:20:30.7134063Z #42 2.883 + fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T10:20:30.7134687Z #42 2.883 + jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T10:20:30.7135500Z #42 2.883 + markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:20:30.7136335Z #42 2.883 + mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T10:20:30.7136962Z #42 2.883 + networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T10:20:30.7137718Z #42 2.883 + numpy==2.3.2 (from file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:20:30.7138684Z #42 2.883 + nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:20:30.7139904Z #42 2.883 + nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) 2025-09-07T10:20:30.7141057Z #42 2.883 + nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T10:20:30.7142317Z #42 2.883 + nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:20:30.7143442Z #42 2.883 + nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:20:30.7144589Z #42 2.883 + nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:20:30.7145697Z #42 2.883 + nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:20:30.7146741Z #42 2.883 + nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:20:30.7147805Z #42 2.883 + nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:20:30.7149077Z #42 2.883 + nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:20:30.7150395Z #42 2.883 + nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T10:20:30.7151462Z #42 2.883 + nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:20:30.7152601Z #42 2.883 + nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T10:20:30.7153797Z #42 2.883 + nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:20:30.7154906Z #42 2.883 + nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) 2025-09-07T10:20:30.7155600Z #42 2.883 + opt-einsum==3.4.0 2025-09-07T10:20:30.7156221Z #42 2.883 + pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:20:30.7157451Z #42 2.884 + pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:20:30.7158342Z #42 2.884 - setuptools==80.9.0 2025-09-07T10:20:30.7158831Z #42 2.884 + setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T10:20:30.7159452Z #42 2.884 + sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T10:20:30.7160339Z #42 2.884 + torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:20:30.7161549Z #42 2.885 + torchaudio==2.8.0.dev20250901+cu129 (from file:///dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:20:30.7162725Z #42 2.885 + torchvision==0.24.0.dev20250901+cu129 (from file:///dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:20:30.7163721Z #42 2.885 + typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T10:21:28.9853007Z #42 DONE 61.3s 2025-09-07T10:21:29.1383900Z 2025-09-07T10:21:29.1384825Z #43 [vllm-base 11/18] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system /wheels/vllm/*.whl --verbose 2025-09-07T10:21:29.4126807Z #43 0.425 DEBUG uv 0.8.4 2025-09-07T10:21:29.5170514Z #43 0.430 DEBUG Searching for default Python interpreter in managed installations or search path 2025-09-07T10:21:29.5172478Z #43 0.430 DEBUG Searching for managed installations at `/root/.local/share/uv/python` 2025-09-07T10:21:29.5173503Z #43 0.432 DEBUG Found `cpython-3.12.11-linux-x86_64-gnu` at `/opt/python/cp312-cp312/bin/python` (first executable in the search path) 2025-09-07T10:21:29.5174369Z #43 0.432 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T10:21:29.5174928Z #43 0.432 DEBUG Acquired lock for `/opt/python/cp312-cp312` 2025-09-07T10:21:29.5175868Z #43 0.434 DEBUG At least one requirement is not satisfied: file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-linux_x86_64.whl 2025-09-07T10:21:29.5176738Z #43 0.435 DEBUG Using request timeout of 500s 2025-09-07T10:21:29.5177204Z #43 0.448 DEBUG Solving with installed Python version: 3.12.11 2025-09-07T10:21:29.5177707Z #43 0.448 DEBUG Solving with target Python version: >=3.12.11 2025-09-07T10:21:29.5178188Z #43 0.449 DEBUG Adding direct dependency: vllm* 2025-09-07T10:21:29.5179083Z #43 0.449 DEBUG Searching for a compatible version of vllm @ file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-linux_x86_64.whl (*) 2025-09-07T10:21:29.5180617Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: regex* 2025-09-07T10:21:29.5181647Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: cachetools* 2025-09-07T10:21:29.5182557Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: psutil* 2025-09-07T10:21:29.5183536Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: sentencepiece* 2025-09-07T10:21:29.5184563Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: numpy* 2025-09-07T10:21:29.5185464Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: requests>=2.26.0 2025-09-07T10:21:29.5186563Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: tqdm* 2025-09-07T10:21:29.5187461Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: blake3* 2025-09-07T10:21:29.5188459Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: py-cpuinfo* 2025-09-07T10:21:29.5189406Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: transformers>=4.55.2 2025-09-07T10:21:29.5190480Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: tokenizers>=0.21.1 2025-09-07T10:21:29.5191393Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: protobuf* 2025-09-07T10:21:29.5192361Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: fastapi[standard]>=0.115.0 2025-09-07T10:21:29.5193377Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: aiohttp* 2025-09-07T10:21:29.5194279Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: openai>=1.99.1 2025-09-07T10:21:29.5195208Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: pydantic>=2.11.7 2025-09-07T10:21:29.5196190Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: prometheus-client>=0.18.0 2025-09-07T10:21:29.5197147Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: pillow* 2025-09-07T10:21:29.5198169Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: prometheus-fastapi-instrumentator>=7.0.0 2025-09-07T10:21:29.5199235Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: tiktoken>=0.6.0 2025-09-07T10:21:29.5200332Z #43 0.449 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: lm-format-enforcer>=0.11.3, <0.11.3+ 2025-09-07T10:21:29.5201816Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: llguidance{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}>=0.7.11, <0.8.0 2025-09-07T10:21:29.5203407Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: outlines-core{platform_machine != 's390x'}>=0.2.10, <0.2.10+ 2025-09-07T10:21:29.5204589Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: diskcache>=5.6.3, <5.6.3+ 2025-09-07T10:21:29.5205572Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: lark>=1.2.2, <1.2.2+ 2025-09-07T10:21:29.5206974Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: xgrammar{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}>=0.1.23, <0.1.23+ 2025-09-07T10:21:29.5208522Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: typing-extensions>=4.10 2025-09-07T10:21:29.5209888Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: filelock>=3.16.1 2025-09-07T10:21:29.5211009Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: partial-json-parser* 2025-09-07T10:21:29.5212056Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: pyzmq>=25.0.0 2025-09-07T10:21:29.5212978Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: msgspec* 2025-09-07T10:21:29.5213885Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: gguf>=0.13.0 2025-09-07T10:21:29.5225653Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: mistral-common[audio]>=1.8.2 2025-09-07T10:21:29.5226753Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: mistral-common[image]>=1.8.2 2025-09-07T10:21:29.5227848Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: opencv-python-headless>=4.11.0 2025-09-07T10:21:29.5228846Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: pyyaml* 2025-09-07T10:21:29.5229951Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: six{python_full_version >= '3.12'}>=1.16.0 2025-09-07T10:21:29.5231582Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: setuptools{python_full_version >= '3.12'}>=77.0.3, <80 2025-09-07T10:21:29.5232729Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: einops* 2025-09-07T10:21:29.5233726Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: compressed-tensors>=0.11.0, <0.11.0+ 2025-09-07T10:21:29.5234802Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: depyf>=0.19.0, <0.19.0+ 2025-09-07T10:21:29.5235747Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: cloudpickle* 2025-09-07T10:21:29.5236661Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: watchfiles* 2025-09-07T10:21:29.5237609Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: python-json-logger* 2025-09-07T10:21:29.5238525Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: scipy* 2025-09-07T10:21:29.5239952Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: ninja* 2025-09-07T10:21:29.5241138Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: pybase64* 2025-09-07T10:21:29.5242095Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: cbor2* 2025-09-07T10:21:29.5243625Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: setproctitle* 2025-09-07T10:21:29.5244669Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: openai-harmony>=0.0.3 2025-09-07T10:21:29.5245816Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: numba{python_full_version >= '3.10'}>=0.61.2, <0.61.2+ 2025-09-07T10:21:29.5246978Z #43 0.450 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129: ray[cgraph]>=2.48.0 2025-09-07T10:21:29.5248405Z #43 0.457 DEBUG Found stale response for: https://pypi.org/simple/cachetools/ 2025-09-07T10:21:29.5249361Z #43 0.457 DEBUG Sending revalidation request for: https://pypi.org/simple/cachetools/ 2025-09-07T10:21:29.5250562Z #43 0.457 DEBUG Found stale response for: https://pypi.org/simple/psutil/ 2025-09-07T10:21:29.5251609Z #43 0.457 DEBUG Sending revalidation request for: https://pypi.org/simple/psutil/ 2025-09-07T10:21:29.5252311Z #43 0.457 DEBUG Found stale response for: https://pypi.org/simple/sentencepiece/ 2025-09-07T10:21:29.5253064Z #43 0.457 DEBUG Sending revalidation request for: https://pypi.org/simple/sentencepiece/ 2025-09-07T10:21:29.5253788Z #43 0.457 DEBUG Found stale response for: https://pypi.org/simple/requests/ 2025-09-07T10:21:29.5254473Z #43 0.457 DEBUG Sending revalidation request for: https://pypi.org/simple/requests/ 2025-09-07T10:21:29.5255285Z #43 0.457 DEBUG Found stale response for: https://pypi.org/simple/tqdm/ 2025-09-07T10:21:29.5256220Z #43 0.457 DEBUG Sending revalidation request for: https://pypi.org/simple/tqdm/ 2025-09-07T10:21:29.5256903Z #43 0.458 DEBUG Found stale response for: https://pypi.org/simple/blake3/ 2025-09-07T10:21:29.5257564Z #43 0.458 DEBUG Sending revalidation request for: https://pypi.org/simple/blake3/ 2025-09-07T10:21:29.5258262Z #43 0.458 DEBUG Found stale response for: https://pypi.org/simple/py-cpuinfo/ 2025-09-07T10:21:29.5258984Z #43 0.458 DEBUG Sending revalidation request for: https://pypi.org/simple/py-cpuinfo/ 2025-09-07T10:21:29.5259699Z #43 0.458 DEBUG Found stale response for: https://pypi.org/simple/transformers/ 2025-09-07T10:21:29.5262130Z #43 0.458 DEBUG Sending revalidation request for: https://pypi.org/simple/transformers/ 2025-09-07T10:21:29.5263023Z #43 0.458 DEBUG Found stale response for: https://pypi.org/simple/fastapi/ 2025-09-07T10:21:29.5263882Z #43 0.458 DEBUG Sending revalidation request for: https://pypi.org/simple/fastapi/ 2025-09-07T10:21:29.5264547Z #43 0.458 DEBUG Found stale response for: https://pypi.org/simple/openai/ 2025-09-07T10:21:29.5265745Z #43 0.458 DEBUG Sending revalidation request for: https://pypi.org/simple/openai/ 2025-09-07T10:21:29.5266651Z #43 0.459 DEBUG Found stale response for: https://pypi.org/simple/pydantic/ 2025-09-07T10:21:29.5267589Z #43 0.459 DEBUG Sending revalidation request for: https://pypi.org/simple/pydantic/ 2025-09-07T10:21:29.5268316Z #43 0.459 DEBUG Found stale response for: https://pypi.org/simple/prometheus-client/ 2025-09-07T10:21:29.5269064Z #43 0.459 DEBUG Sending revalidation request for: https://pypi.org/simple/prometheus-client/ 2025-09-07T10:21:29.5269913Z #43 0.459 DEBUG Found stale response for: https://pypi.org/simple/prometheus-fastapi-instrumentator/ 2025-09-07T10:21:29.5270832Z #43 0.459 DEBUG Sending revalidation request for: https://pypi.org/simple/prometheus-fastapi-instrumentator/ 2025-09-07T10:21:29.5271621Z #43 0.459 DEBUG Found stale response for: https://pypi.org/simple/tiktoken/ 2025-09-07T10:21:29.5272299Z #43 0.459 DEBUG Sending revalidation request for: https://pypi.org/simple/tiktoken/ 2025-09-07T10:21:29.5273104Z #43 0.459 DEBUG Found stale response for: https://pypi.org/simple/lm-format-enforcer/ 2025-09-07T10:21:29.5273875Z #43 0.459 DEBUG Sending revalidation request for: https://pypi.org/simple/lm-format-enforcer/ 2025-09-07T10:21:29.5274992Z #43 0.459 DEBUG Found stale response for: https://pypi.org/simple/llguidance/ 2025-09-07T10:21:29.5276167Z #43 0.459 DEBUG Sending revalidation request for: https://pypi.org/simple/llguidance/ 2025-09-07T10:21:29.5277078Z #43 0.459 DEBUG Found stale response for: https://pypi.org/simple/outlines-core/ 2025-09-07T10:21:29.5277927Z #43 0.459 DEBUG Sending revalidation request for: https://pypi.org/simple/outlines-core/ 2025-09-07T10:21:29.5278634Z #43 0.459 DEBUG Found stale response for: https://pypi.org/simple/diskcache/ 2025-09-07T10:21:29.5279494Z #43 0.459 DEBUG Sending revalidation request for: https://pypi.org/simple/diskcache/ 2025-09-07T10:21:29.5280177Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/lark/ 2025-09-07T10:21:29.5280837Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/lark/ 2025-09-07T10:21:29.5281499Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/xgrammar/ 2025-09-07T10:21:29.5282264Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/xgrammar/ 2025-09-07T10:21:29.5282994Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T10:21:29.5283779Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/typing-extensions/ 2025-09-07T10:21:29.5284521Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/filelock/ 2025-09-07T10:21:29.5285206Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/filelock/ 2025-09-07T10:21:29.5285963Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/partial-json-parser/ 2025-09-07T10:21:29.5286753Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/partial-json-parser/ 2025-09-07T10:21:29.5287508Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/msgspec/ 2025-09-07T10:21:29.5288185Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/msgspec/ 2025-09-07T10:21:29.5288857Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/gguf/ 2025-09-07T10:21:29.5289511Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/gguf/ 2025-09-07T10:21:29.5290204Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/mistral-common/ 2025-09-07T10:21:29.5291204Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/mistral-common/ 2025-09-07T10:21:29.5292231Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/opencv-python-headless/ 2025-09-07T10:21:29.5293344Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/opencv-python-headless/ 2025-09-07T10:21:29.5294191Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/pyyaml/ 2025-09-07T10:21:29.5294917Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/pyyaml/ 2025-09-07T10:21:29.5295959Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/six/ 2025-09-07T10:21:29.5296735Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/six/ 2025-09-07T10:21:29.5297427Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/setuptools/ 2025-09-07T10:21:29.5298142Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/setuptools/ 2025-09-07T10:21:29.5298848Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/einops/ 2025-09-07T10:21:29.5299526Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/einops/ 2025-09-07T10:21:29.5300244Z #43 0.460 DEBUG Found stale response for: https://pypi.org/simple/compressed-tensors/ 2025-09-07T10:21:29.5301034Z #43 0.460 DEBUG Sending revalidation request for: https://pypi.org/simple/compressed-tensors/ 2025-09-07T10:21:29.5302058Z #43 0.461 DEBUG Found stale response for: https://pypi.org/simple/depyf/ 2025-09-07T10:21:29.5302841Z #43 0.461 DEBUG Sending revalidation request for: https://pypi.org/simple/depyf/ 2025-09-07T10:21:29.5303517Z #43 0.461 DEBUG Found stale response for: https://pypi.org/simple/cloudpickle/ 2025-09-07T10:21:29.5304205Z #43 0.461 DEBUG Sending revalidation request for: https://pypi.org/simple/cloudpickle/ 2025-09-07T10:21:29.5304915Z #43 0.461 DEBUG Found stale response for: https://pypi.org/simple/watchfiles/ 2025-09-07T10:21:29.5305597Z #43 0.461 DEBUG Sending revalidation request for: https://pypi.org/simple/watchfiles/ 2025-09-07T10:21:29.5306332Z #43 0.461 DEBUG Found stale response for: https://pypi.org/simple/python-json-logger/ 2025-09-07T10:21:29.5307082Z #43 0.461 DEBUG Sending revalidation request for: https://pypi.org/simple/python-json-logger/ 2025-09-07T10:21:29.5307793Z #43 0.461 DEBUG Found stale response for: https://pypi.org/simple/scipy/ 2025-09-07T10:21:29.5308444Z #43 0.461 DEBUG Sending revalidation request for: https://pypi.org/simple/scipy/ 2025-09-07T10:21:29.5309083Z #43 0.461 DEBUG Found stale response for: https://pypi.org/simple/ninja/ 2025-09-07T10:21:29.5309770Z #43 0.461 DEBUG Sending revalidation request for: https://pypi.org/simple/ninja/ 2025-09-07T10:21:29.5310425Z #43 0.461 DEBUG Found stale response for: https://pypi.org/simple/pybase64/ 2025-09-07T10:21:29.5311107Z #43 0.461 DEBUG Sending revalidation request for: https://pypi.org/simple/pybase64/ 2025-09-07T10:21:29.5311787Z #43 0.461 DEBUG Found stale response for: https://pypi.org/simple/cbor2/ 2025-09-07T10:21:29.5312426Z #43 0.461 DEBUG Sending revalidation request for: https://pypi.org/simple/cbor2/ 2025-09-07T10:21:29.5313112Z #43 0.461 DEBUG Found stale response for: https://pypi.org/simple/setproctitle/ 2025-09-07T10:21:29.5313817Z #43 0.461 DEBUG Sending revalidation request for: https://pypi.org/simple/setproctitle/ 2025-09-07T10:21:29.5314540Z #43 0.461 DEBUG Found stale response for: https://pypi.org/simple/openai-harmony/ 2025-09-07T10:21:29.5315260Z #43 0.461 DEBUG Sending revalidation request for: https://pypi.org/simple/openai-harmony/ 2025-09-07T10:21:29.5315951Z #43 0.461 DEBUG Found stale response for: https://pypi.org/simple/numba/ 2025-09-07T10:21:29.5316597Z #43 0.461 DEBUG Sending revalidation request for: https://pypi.org/simple/numba/ 2025-09-07T10:21:29.5317229Z #43 0.462 DEBUG Found stale response for: https://pypi.org/simple/numpy/ 2025-09-07T10:21:29.5317876Z #43 0.462 DEBUG Sending revalidation request for: https://pypi.org/simple/numpy/ 2025-09-07T10:21:29.5318577Z #43 0.462 DEBUG Found stale response for: https://pypi.org/simple/tokenizers/ 2025-09-07T10:21:29.5319278Z #43 0.462 DEBUG Sending revalidation request for: https://pypi.org/simple/tokenizers/ 2025-09-07T10:21:29.5319962Z #43 0.462 DEBUG Found stale response for: https://pypi.org/simple/protobuf/ 2025-09-07T10:21:29.5320644Z #43 0.462 DEBUG Sending revalidation request for: https://pypi.org/simple/protobuf/ 2025-09-07T10:21:29.5321354Z #43 0.462 DEBUG Found stale response for: https://pypi.org/simple/pillow/ 2025-09-07T10:21:29.5321998Z #43 0.462 DEBUG Sending revalidation request for: https://pypi.org/simple/pillow/ 2025-09-07T10:21:29.5322657Z #43 0.464 DEBUG Found stale response for: https://pypi.org/simple/regex/ 2025-09-07T10:21:29.5323286Z #43 0.464 DEBUG Sending revalidation request for: https://pypi.org/simple/regex/ 2025-09-07T10:21:29.5323935Z #43 0.465 DEBUG Found stale response for: https://pypi.org/simple/pyzmq/ 2025-09-07T10:21:29.5324578Z #43 0.465 DEBUG Sending revalidation request for: https://pypi.org/simple/pyzmq/ 2025-09-07T10:21:29.5325251Z #43 0.481 DEBUG Found not-modified response for: https://pypi.org/simple/numpy/ 2025-09-07T10:21:29.5325941Z #43 0.483 DEBUG Found not-modified response for: https://pypi.org/simple/protobuf/ 2025-09-07T10:21:29.5326616Z #43 0.485 DEBUG Found not-modified response for: https://pypi.org/simple/psutil/ 2025-09-07T10:21:29.5327294Z #43 0.485 DEBUG Found not-modified response for: https://pypi.org/simple/pillow/ 2025-09-07T10:21:29.5328009Z #43 0.487 DEBUG Found not-modified response for: https://pypi.org/simple/pyzmq/ 2025-09-07T10:21:29.5328688Z #43 0.491 DEBUG Found not-modified response for: https://pypi.org/simple/openai/ 2025-09-07T10:21:29.5329393Z #43 0.491 DEBUG Found not-modified response for: https://pypi.org/simple/cachetools/ 2025-09-07T10:21:29.5330116Z #43 0.491 DEBUG Found not-modified response for: https://pypi.org/simple/sentencepiece/ 2025-09-07T10:21:29.5330952Z #43 0.491 DEBUG Found not-modified response for: https://pypi.org/simple/requests/ 2025-09-07T10:21:29.5331817Z #43 0.492 DEBUG Found not-modified response for: https://pypi.org/simple/tqdm/ 2025-09-07T10:21:29.5332503Z #43 0.492 DEBUG Found not-modified response for: https://pypi.org/simple/blake3/ 2025-09-07T10:21:29.5333172Z #43 0.492 DEBUG Found stale response for: https://pypi.org/simple/aiohttp/ 2025-09-07T10:21:29.5333841Z #43 0.492 DEBUG Sending revalidation request for: https://pypi.org/simple/aiohttp/ 2025-09-07T10:21:29.5334585Z #43 0.492 DEBUG Found not-modified response for: https://pypi.org/simple/py-cpuinfo/ 2025-09-07T10:21:29.5335327Z #43 0.492 DEBUG Found not-modified response for: https://pypi.org/simple/transformers/ 2025-09-07T10:21:29.5336101Z #43 0.492 DEBUG Found not-modified response for: https://pypi.org/simple/fastapi/ 2025-09-07T10:21:29.5337077Z #43 0.493 DEBUG Found not-modified response for: https://pypi.org/simple/pydantic/ 2025-09-07T10:21:29.5337915Z #43 0.494 DEBUG Found not-modified response for: https://pypi.org/simple/prometheus-client/ 2025-09-07T10:21:29.5338823Z #43 0.494 DEBUG Found not-modified response for: https://pypi.org/simple/prometheus-fastapi-instrumentator/ 2025-09-07T10:21:29.5339666Z #43 0.494 DEBUG Found not-modified response for: https://pypi.org/simple/tiktoken/ 2025-09-07T10:21:29.5340436Z #43 0.495 DEBUG Found not-modified response for: https://pypi.org/simple/lm-format-enforcer/ 2025-09-07T10:21:29.5341204Z #43 0.495 DEBUG Found not-modified response for: https://pypi.org/simple/llguidance/ 2025-09-07T10:21:29.5341971Z #43 0.495 DEBUG Found not-modified response for: https://pypi.org/simple/outlines-core/ 2025-09-07T10:21:29.5342731Z #43 0.495 DEBUG Found not-modified response for: https://pypi.org/simple/diskcache/ 2025-09-07T10:21:29.5343522Z #43 0.495 DEBUG Found not-modified response for: https://pypi.org/simple/lark/ 2025-09-07T10:21:29.5344198Z #43 0.495 DEBUG Found not-modified response for: https://pypi.org/simple/xgrammar/ 2025-09-07T10:21:29.5344926Z #43 0.496 DEBUG Found not-modified response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T10:21:29.5345916Z #43 0.496 DEBUG Found not-modified response for: https://pypi.org/simple/filelock/ 2025-09-07T10:21:29.5346666Z #43 0.496 DEBUG Found not-modified response for: https://pypi.org/simple/partial-json-parser/ 2025-09-07T10:21:29.5347419Z #43 0.496 DEBUG Found not-modified response for: https://pypi.org/simple/msgspec/ 2025-09-07T10:21:29.5348094Z #43 0.497 DEBUG Found not-modified response for: https://pypi.org/simple/gguf/ 2025-09-07T10:21:29.5349486Z #43 0.497 DEBUG Found not-modified response for: https://pypi.org/simple/mistral-common/ 2025-09-07T10:21:29.5350516Z #43 0.497 DEBUG Found not-modified response for: https://pypi.org/simple/opencv-python-headless/ 2025-09-07T10:21:29.5351282Z #43 0.498 DEBUG Found not-modified response for: https://pypi.org/simple/pyyaml/ 2025-09-07T10:21:29.5351969Z #43 0.498 DEBUG Found not-modified response for: https://pypi.org/simple/six/ 2025-09-07T10:21:29.5352678Z #43 0.498 DEBUG Found not-modified response for: https://pypi.org/simple/setuptools/ 2025-09-07T10:21:29.5353392Z #43 0.499 DEBUG Found not-modified response for: https://pypi.org/simple/einops/ 2025-09-07T10:21:29.5354149Z #43 0.499 DEBUG Found not-modified response for: https://pypi.org/simple/compressed-tensors/ 2025-09-07T10:21:29.5354887Z #43 0.499 DEBUG Found not-modified response for: https://pypi.org/simple/depyf/ 2025-09-07T10:21:29.5355611Z #43 0.499 DEBUG Found not-modified response for: https://pypi.org/simple/cloudpickle/ 2025-09-07T10:21:29.5356449Z #43 0.500 DEBUG Found not-modified response for: https://pypi.org/simple/watchfiles/ 2025-09-07T10:21:29.5357233Z #43 0.500 DEBUG Found not-modified response for: https://pypi.org/simple/python-json-logger/ 2025-09-07T10:21:29.5357991Z #43 0.500 DEBUG Found not-modified response for: https://pypi.org/simple/scipy/ 2025-09-07T10:21:29.5358665Z #43 0.502 DEBUG Found not-modified response for: https://pypi.org/simple/ninja/ 2025-09-07T10:21:29.5359380Z #43 0.502 DEBUG Found not-modified response for: https://pypi.org/simple/pybase64/ 2025-09-07T10:21:29.5360083Z #43 0.503 DEBUG Found not-modified response for: https://pypi.org/simple/cbor2/ 2025-09-07T10:21:29.5360821Z #43 0.503 DEBUG Found not-modified response for: https://pypi.org/simple/setproctitle/ 2025-09-07T10:21:29.5361701Z #43 0.503 DEBUG Found not-modified response for: https://pypi.org/simple/openai-harmony/ 2025-09-07T10:21:29.5362403Z #43 0.504 DEBUG Found not-modified response for: https://pypi.org/simple/numba/ 2025-09-07T10:21:29.5363105Z #43 0.505 DEBUG Found not-modified response for: https://pypi.org/simple/tokenizers/ 2025-09-07T10:21:29.5363838Z #43 0.507 DEBUG Found not-modified response for: https://pypi.org/simple/regex/ 2025-09-07T10:21:29.5364518Z #43 0.517 DEBUG Found not-modified response for: https://pypi.org/simple/aiohttp/ 2025-09-07T10:21:29.5365242Z #43 0.523 DEBUG Searching for a compatible version of lm-format-enforcer (>=0.11.3, <0.11.3+) 2025-09-07T10:21:29.5366237Z #43 0.523 DEBUG Selecting: lm-format-enforcer==0.11.3 [compatible] (lm_format_enforcer-0.11.3-py3-none-any.whl) 2025-09-07T10:21:29.5367696Z #43 0.530 DEBUG Found installed version of numpy==2.3.2 (from file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies * 2025-09-07T10:21:29.6172419Z #43 0.531 DEBUG Found installed version of pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies * 2025-09-07T10:21:29.6174134Z #43 0.534 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10 2025-09-07T10:21:29.6175946Z #43 0.534 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies >=3.16.1 2025-09-07T10:21:29.6176818Z #43 0.535 DEBUG No cache entry for: https://pypi.org/simple/ray/ 2025-09-07T10:21:29.6178366Z #43 0.535 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/40/01/2e730bd1c25392fc32e3268e02446f0d77cb51a2c3a8486b1798e34d5805/protobuf-6.32.0-cp39-abi3-manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6180808Z #43 0.536 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/bf/b9/b0eb3f3cbcb734d930fdf839431606844a825b23eaf9a6ab371edac8162c/psutil-7.0.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6183326Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/0e/66/d781ab0636570d32c745c4e389b1c6b713115905cca69ab6233508622edd/pyzmq-27.0.2-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T10:21:29.6186048Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/00/e1/47887212baa7bc0532880d33d5eafbdb46fcc4b53789b903282a74a85b5b/openai-1.106.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6188332Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/6c/56/3124f61d37a7a4e7cc96afc5492c78ba0cb551151e530b54669ddd1436ef/cachetools-6.2.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6191071Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/04/88/14f2f4a2b922d8b39be45bf63d79e6cd3a9b2f248b2fcb98a69b12af12f5/sentencepiece-0.2.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T10:21:29.6193928Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/1e/db/4254e3eabe8020b458f1a747140d32277ec7a271daf1d235b70dc0b4e6e3/requests-2.32.5-py3-none-any.whl.metadata 2025-09-07T10:21:29.6196449Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d0/30/dc54f88dd4a2b5dc8a0279bdd7270e735851848b762aeb1c1184ed1f6b14/tqdm-4.67.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6198646Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/5c/04/a86bfb3c20e859e43ead0b13be59afd98feb166ea929e76fa3d190f65f6e/blake3-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6201383Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e0/a9/023730ba63db1e494a271cb018dcd361bd2c917ba7004c3e49d5daf795a2/py_cpuinfo-9.0.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6203948Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/6a/c0/ec2b1c8712ca690e5d61979dee872603e92b8a32f94cc1b72d53beab008a/pydantic-2.11.7-py3-none-any.whl.metadata 2025-09-07T10:21:29.6206378Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/32/ae/ec06af4fe3ee72d16973474f122541746196aaa16cea6f66d18b963c6177/prometheus_client-0.22.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6209247Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/27/72/0824c18f3bc75810f55dacc2dd933f6ec829771180245ae3cc976195dec0/prometheus_fastapi_instrumentator-7.1.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6212091Z #43 0.537 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/71/7c/283c3dd35e00e22a7803a0b2a65251347b745474a82399be058bde1c9f15/transformers-4.56.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6214752Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d5/2d/4d77f6feb9292bfdd23d5813e442b3bba883f42d0ac78ef5fdc56873f756/tiktoken-0.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6217505Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/a0/ef/11292bb0b85cf4c93447cab5a29f64576ed14d3ab4280e35ddd23486594a/lm_format_enforcer-0.11.3-py3-none-any.whl.metadata 2025-09-07T10:21:29.6219740Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/3f/27/4570e78fc0bf5ea0ca45eb1de3818a23787af9b390c0b0a0033a1b8236f9/diskcache-5.6.3-py3-none-any.whl.metadata 2025-09-07T10:21:29.6222364Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/2d/00/d90b10b962b4277f5e64a78b6609968859ff86889f5b898c1a778c06ec00/lark-1.2.2-py3-none-any.whl.metadata 2025-09-07T10:21:29.6223803Z #43 0.538 DEBUG Adding transitive dependency for lm-format-enforcer==0.11.3: interegular>=0.3.2 2025-09-07T10:21:29.6224608Z #43 0.538 DEBUG Adding transitive dependency for lm-format-enforcer==0.11.3: packaging* 2025-09-07T10:21:29.6226238Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/cb/40/1f922794af3dc7503f19319a8804b398a161a2cd54183cff8b12225b8d85/partial_json_parser-0.2.1.1.post6-py3-none-any.whl.metadata 2025-09-07T10:21:29.6227776Z #43 0.538 DEBUG Adding transitive dependency for lm-format-enforcer==0.11.3: pydantic>=1.10.8 2025-09-07T10:21:29.6228740Z #43 0.538 DEBUG Adding transitive dependency for lm-format-enforcer==0.11.3: pyyaml* 2025-09-07T10:21:29.6230375Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d0/ef/c5422ce8af73928d194a6606f8ae36e93a52fd5e8df5abd366903a5ca8da/msgspec-0.19.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6232175Z #43 0.538 DEBUG Searching for a compatible version of outlines-core{platform_machine != 's390x'} (>=0.2.10, <0.2.10+) 2025-09-07T10:21:29.6233590Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/fc/31/6a93a887617ee7deeaa602ca3d02d1c12a6cb8a742a695de5d128f5fa46a/gguf-0.17.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6235108Z #43 0.538 DEBUG Selecting: outlines-core==0.2.10 [compatible] (outlines_core-0.2.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6236978Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/89/53/e19c21e0c4eb1275c3e2c97b081103b6dfb3938172264d283a519bf728b9/opencv_python_headless-4.12.0.88-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata 2025-09-07T10:21:29.6238703Z #43 0.538 DEBUG Adding transitive dependency for outlines-core==0.2.10: outlines-core==0.2.10 2025-09-07T10:21:29.6239702Z #43 0.538 DEBUG Adding transitive dependency for outlines-core==0.2.10: outlines-core{platform_machine != 's390x'}==0.2.10 2025-09-07T10:21:29.6242130Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b9/2b/614b4752f2e127db5cc206abc23a8c19678e92b23c3db30fc86ab731d3bd/PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6243596Z #43 0.538 DEBUG Searching for a compatible version of outlines-core (==0.2.10) 2025-09-07T10:21:29.6244902Z #43 0.538 DEBUG Selecting: outlines-core==0.2.10 [compatible] (outlines_core-0.2.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6246664Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/87/62/9773de14fe6c45c23649e98b83231fffd7b9892b6cf863251dc2afa73643/einops-0.8.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6248589Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d2/81/e3073017a8f5c75169e79108eda209e6089e3f96c9f197d307cbda7df71c/compressed_tensors-0.11.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6251230Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/28/4d/1192acbcdc5e843f5e5d51f6e8788f2b60a9fe0b578ac385ded67a0b0b26/depyf-0.19.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6253217Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/7e/e8/64c37fadfc2816a7701fa8a6ed8d87327c7d54eacfbfb6edab14a2f2be75/cloudpickle-3.1.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6255866Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/8c/77/e3362fe308358dc9f8588102481e599c83e1b91c2ae843780a7ded939a35/watchfiles-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6258102Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/08/20/0f2523b9e50a8052bc6a8b732dfc8568abbdc42010aef03a2d750bdab3b2/python_json_logger-3.3.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6260416Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/51/1e/79023ca3bbb13a015d7d2757ecca3b81293c663694c35d6541b4dca53e98/scipy-1.16.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata 2025-09-07T10:21:29.6262867Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/de/5e/3bf5acea47a96a28c121b167f5ef659cf71208b19e52a88cdfa5c37f1fcc/aiohttp-3.12.15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6265341Z #43 0.538 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b2/02/5c891bb5fe0691cc1bad336e3a94b9097fbcf9707ec8ddc1dce9f0397289/regex-2025.9.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T10:21:29.6267732Z #43 0.539 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d4/61/aeab3402c26874b74bb67a7f2c4b569dde29b51032c5384db592e7b216f4/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6270061Z #43 0.539 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/1f/ec/dcdcace0ffcf3a532cca910e0c351b62d3a7decf0b091ea8cf856d2a67a6/openai_harmony-0.0.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6272465Z #43 0.539 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d0/99/71630546b9395b095f4082be41165d1078204d1696c2d9baade3de3202d0/setproctitle-1.3.7-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.metadata 2025-09-07T10:21:29.6274983Z #43 0.539 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/a8/6e/3499eaa2b858c7695a447b6311303f06ffc90fc2c45851337121661f1f5c/cbor2-5.7.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T10:21:29.6277528Z #43 0.539 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/ee/87/d9baf98cbfc37b8657290ad4421f3a3c36aa0eafe4872c5859cfb52f3448/pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata 2025-09-07T10:21:29.6279969Z #43 0.539 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/ed/de/0e6edf44d6a04dabd0318a519125ed0415ce437ad5a1ec9b9be03d9048cf/ninja-1.13.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata 2025-09-07T10:21:29.6281433Z #43 0.539 DEBUG Found stale response for: https://pypi.org/simple/interegular/ 2025-09-07T10:21:29.6282152Z #43 0.539 DEBUG Sending revalidation request for: https://pypi.org/simple/interegular/ 2025-09-07T10:21:29.6282840Z #43 0.539 DEBUG Found stale response for: https://pypi.org/simple/packaging/ 2025-09-07T10:21:29.6283605Z #43 0.539 DEBUG Sending revalidation request for: https://pypi.org/simple/packaging/ 2025-09-07T10:21:29.6285136Z #43 0.539 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c2/83/db792ce386d1c13d875a03d6ff5ba31612cfb558ecf5b945910db9505574/outlines_core-0.2.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6286751Z #43 0.539 DEBUG Searching for a compatible version of outlines-core{platform_machine != 's390x'} (==0.2.10) 2025-09-07T10:21:29.6287789Z #43 0.539 DEBUG Selecting: outlines-core==0.2.10 [compatible] (outlines_core-0.2.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6288697Z #43 0.539 DEBUG Searching for a compatible version of diskcache (>=5.6.3, <5.6.3+) 2025-09-07T10:21:29.6289429Z #43 0.539 DEBUG Selecting: diskcache==5.6.3 [compatible] (diskcache-5.6.3-py3-none-any.whl) 2025-09-07T10:21:29.6290232Z #43 0.539 DEBUG Searching for a compatible version of lark (>=1.2.2, <1.2.2+) 2025-09-07T10:21:29.6291084Z #43 0.539 DEBUG Selecting: lark==1.2.2 [compatible] (lark-1.2.2-py3-none-any.whl) 2025-09-07T10:21:29.6292348Z #43 0.539 DEBUG Searching for a compatible version of xgrammar{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (>=0.1.23, <0.1.23+) 2025-09-07T10:21:29.6293692Z #43 0.539 DEBUG Selecting: xgrammar==0.1.23 [compatible] (xgrammar-0.1.23-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6294613Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: xgrammar==0.1.23 2025-09-07T10:21:29.6295712Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: xgrammar{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}==0.1.23 2025-09-07T10:21:29.6296812Z #43 0.539 DEBUG Searching for a compatible version of xgrammar (==0.1.23) 2025-09-07T10:21:29.6297691Z #43 0.539 DEBUG Selecting: xgrammar==0.1.23 [compatible] (xgrammar-0.1.23-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6299434Z #43 0.539 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/a1/13/53d950b93a361ef73e5930050916fa36c23fade80ee05cfb0339c044e951/xgrammar-0.1.23-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6300951Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: pydantic* 2025-09-07T10:21:29.6301622Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: torch>=1.10.0 2025-09-07T10:21:29.6302326Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: transformers>=4.38.0 2025-09-07T10:21:29.6303359Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: triton{platform_machine == 'x86_64' and sys_platform == 'linux'}* 2025-09-07T10:21:29.6304225Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: ninja* 2025-09-07T10:21:29.6304825Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: numpy* 2025-09-07T10:21:29.6305501Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: typing-extensions>=4.9.0 2025-09-07T10:21:29.6306573Z #43 0.539 DEBUG Searching for a compatible version of xgrammar{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (==0.1.23) 2025-09-07T10:21:29.6307796Z #43 0.539 DEBUG Selecting: xgrammar==0.1.23 [compatible] (xgrammar-0.1.23-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6308641Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: pydantic* 2025-09-07T10:21:29.6309343Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: torch>=1.10.0 2025-09-07T10:21:29.6310022Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: transformers>=4.38.0 2025-09-07T10:21:29.6310938Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: triton{platform_machine == 'x86_64' and sys_platform == 'linux'}* 2025-09-07T10:21:29.6311823Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: ninja* 2025-09-07T10:21:29.6312415Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: numpy* 2025-09-07T10:21:29.6313116Z #43 0.539 DEBUG Adding transitive dependency for xgrammar==0.1.23: typing-extensions>=4.9.0 2025-09-07T10:21:29.6314044Z #43 0.539 DEBUG Searching for a compatible version of compressed-tensors (>=0.11.0, <0.11.0+) 2025-09-07T10:21:29.6315163Z #43 0.539 DEBUG Selecting: compressed-tensors==0.11.0 [compatible] (compressed_tensors-0.11.0-py3-none-any.whl) 2025-09-07T10:21:29.6316260Z #43 0.539 DEBUG Adding transitive dependency for compressed-tensors==0.11.0: torch>=1.7.0 2025-09-07T10:21:29.6317277Z #43 0.539 DEBUG Adding transitive dependency for compressed-tensors==0.11.0: transformers* 2025-09-07T10:21:29.6318312Z #43 0.539 DEBUG Adding transitive dependency for compressed-tensors==0.11.0: pydantic>=2.0 2025-09-07T10:21:29.6319434Z #43 0.539 DEBUG Adding transitive dependency for compressed-tensors==0.11.0: frozendict* 2025-09-07T10:21:29.6320342Z #43 0.539 DEBUG Searching for a compatible version of depyf (>=0.19.0, <0.19.0+) 2025-09-07T10:21:29.6321418Z #43 0.539 DEBUG Selecting: depyf==0.19.0 [compatible] (depyf-0.19.0-py3-none-any.whl) 2025-09-07T10:21:29.6322444Z #43 0.540 DEBUG Adding transitive dependency for depyf==0.19.0: astor* 2025-09-07T10:21:29.6323251Z #43 0.540 DEBUG Adding transitive dependency for depyf==0.19.0: dill* 2025-09-07T10:21:29.6324255Z #43 0.540 DEBUG Searching for a compatible version of numba{python_full_version >= '3.10'} (>=0.61.2, <0.61.2+) 2025-09-07T10:21:29.6325427Z #43 0.540 DEBUG Found stale response for: https://pypi.org/simple/triton/ 2025-09-07T10:21:29.6326208Z #43 0.540 DEBUG Sending revalidation request for: https://pypi.org/simple/triton/ 2025-09-07T10:21:29.6327081Z #43 0.540 DEBUG Selecting: numba==0.61.2 [compatible] (numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.6327964Z #43 0.540 DEBUG Adding transitive dependency for numba==0.61.2: numba==0.61.2 2025-09-07T10:21:29.6328710Z #43 0.540 DEBUG Adding transitive dependency for numba==0.61.2: numba{python_full_version >= '3.10'}==0.61.2 2025-09-07T10:21:29.6329440Z #43 0.540 DEBUG Searching for a compatible version of numba (==0.61.2) 2025-09-07T10:21:29.6330223Z #43 0.540 DEBUG Selecting: numba==0.61.2 [compatible] (numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.6331319Z #43 0.540 DEBUG Found stale response for: https://pypi.org/simple/astor/ 2025-09-07T10:21:29.6331993Z #43 0.540 DEBUG Sending revalidation request for: https://pypi.org/simple/astor/ 2025-09-07T10:21:29.6332643Z #43 0.540 DEBUG Found stale response for: https://pypi.org/simple/dill/ 2025-09-07T10:21:29.6333298Z #43 0.540 DEBUG Sending revalidation request for: https://pypi.org/simple/dill/ 2025-09-07T10:21:29.6334874Z #43 0.540 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/9a/2d/e518df036feab381c23a624dac47f8445ac55686ec7f11083655eb707da3/numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata 2025-09-07T10:21:29.6336456Z #43 0.540 DEBUG Adding transitive dependency for numba==0.61.2: llvmlite>=0.44.0.dev0, <0.45 2025-09-07T10:21:29.6337195Z #43 0.540 DEBUG Adding transitive dependency for numba==0.61.2: numpy>=1.24, <2.3 2025-09-07T10:21:29.6337953Z #43 0.540 DEBUG Searching for a compatible version of numba{python_full_version >= '3.10'} (==0.61.2) 2025-09-07T10:21:29.6338908Z #43 0.540 DEBUG Selecting: numba==0.61.2 [compatible] (numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.6339812Z #43 0.540 DEBUG Adding transitive dependency for numba==0.61.2: llvmlite>=0.44.0.dev0, <0.45 2025-09-07T10:21:29.6340544Z #43 0.540 DEBUG Adding transitive dependency for numba==0.61.2: numpy>=1.24, <2.3 2025-09-07T10:21:29.6341170Z #43 0.540 DEBUG Searching for a compatible version of regex (*) 2025-09-07T10:21:29.6342120Z #43 0.540 DEBUG Selecting: regex==2025.9.1 [compatible] (regex-2025.9.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.6343425Z #43 0.540 DEBUG Searching for a compatible version of cachetools (*) 2025-09-07T10:21:29.6344276Z #43 0.540 DEBUG Selecting: cachetools==6.2.0 [compatible] (cachetools-6.2.0-py3-none-any.whl) 2025-09-07T10:21:29.6344939Z #43 0.540 DEBUG Searching for a compatible version of psutil (*) 2025-09-07T10:21:29.6346035Z #43 0.540 DEBUG Selecting: psutil==7.0.0 [compatible] (psutil-7.0.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6347207Z #43 0.540 DEBUG Found stale response for: https://pypi.org/simple/torch/ 2025-09-07T10:21:29.6348168Z #43 0.540 DEBUG Sending revalidation request for: https://pypi.org/simple/torch/ 2025-09-07T10:21:29.6349276Z #43 0.540 DEBUG Searching for a compatible version of sentencepiece (*) 2025-09-07T10:21:29.6350355Z #43 0.540 DEBUG Selecting: sentencepiece==0.2.1 [compatible] (sentencepiece-0.2.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.6352308Z #43 0.540 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/8c/3d/1e1db36cfd41f895d266b103df00ca5b3cbe965184df824dec5c08c6b803/numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6353946Z #43 0.540 DEBUG Searching for a compatible version of numpy (>=1.24, <2.3) 2025-09-07T10:21:29.6355285Z #43 0.540 DEBUG Selecting: numpy==2.2.6 [compatible] (numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6356215Z #43 0.540 DEBUG Searching for a compatible version of requests (>=2.26.0) 2025-09-07T10:21:29.6357015Z #43 0.540 DEBUG Selecting: requests==2.32.5 [compatible] (requests-2.32.5-py3-none-any.whl) 2025-09-07T10:21:29.6357805Z #43 0.540 DEBUG Adding transitive dependency for requests==2.32.5: charset-normalizer>=2, <4 2025-09-07T10:21:29.6358601Z #43 0.540 DEBUG Adding transitive dependency for requests==2.32.5: idna>=2.5, <4 2025-09-07T10:21:29.6359312Z #43 0.540 DEBUG Adding transitive dependency for requests==2.32.5: urllib3>=1.21.1, <3 2025-09-07T10:21:29.6360029Z #43 0.540 DEBUG Adding transitive dependency for requests==2.32.5: certifi>=2017.4.17 2025-09-07T10:21:29.6360783Z #43 0.540 DEBUG Searching for a compatible version of tqdm (*) 2025-09-07T10:21:29.6361365Z #43 0.540 DEBUG Selecting: tqdm==4.67.1 [compatible] (tqdm-4.67.1-py3-none-any.whl) 2025-09-07T10:21:29.6361953Z #43 0.540 DEBUG Searching for a compatible version of blake3 (*) 2025-09-07T10:21:29.6362734Z #43 0.540 DEBUG Selecting: blake3==1.0.5 [compatible] (blake3-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6363568Z #43 0.540 DEBUG Found stale response for: https://pypi.org/simple/frozendict/ 2025-09-07T10:21:29.6364324Z #43 0.540 DEBUG Sending revalidation request for: https://pypi.org/simple/frozendict/ 2025-09-07T10:21:29.6364971Z #43 0.540 DEBUG Searching for a compatible version of py-cpuinfo (*) 2025-09-07T10:21:29.6365642Z #43 0.540 DEBUG Selecting: py-cpuinfo==9.0.0 [compatible] (py_cpuinfo-9.0.0-py3-none-any.whl) 2025-09-07T10:21:29.6366621Z #43 0.540 DEBUG Searching for a compatible version of transformers (>=4.55.2) 2025-09-07T10:21:29.6367392Z #43 0.540 DEBUG Selecting: transformers==4.56.1 [compatible] (transformers-4.56.1-py3-none-any.whl) 2025-09-07T10:21:29.6368106Z #43 0.540 DEBUG Found stale response for: https://pypi.org/simple/idna/ 2025-09-07T10:21:29.6368732Z #43 0.540 DEBUG Sending revalidation request for: https://pypi.org/simple/idna/ 2025-09-07T10:21:29.6369494Z #43 0.540 DEBUG Adding transitive dependency for transformers==4.56.1: filelock* 2025-09-07T10:21:29.6370322Z #43 0.540 DEBUG Adding transitive dependency for transformers==4.56.1: huggingface-hub>=0.34.0, <1.0 2025-09-07T10:21:29.6371338Z #43 0.540 DEBUG Adding transitive dependency for transformers==4.56.1: numpy>=1.17 2025-09-07T10:21:29.6372094Z #43 0.540 DEBUG Adding transitive dependency for transformers==4.56.1: packaging>=20.0 2025-09-07T10:21:29.6372857Z #43 0.540 DEBUG Found stale response for: https://pypi.org/simple/urllib3/ 2025-09-07T10:21:29.6373538Z #43 0.540 DEBUG Adding transitive dependency for transformers==4.56.1: pyyaml>=5.1 2025-09-07T10:21:29.6374249Z #43 0.540 DEBUG Sending revalidation request for: https://pypi.org/simple/urllib3/ 2025-09-07T10:21:29.6375057Z #43 0.540 DEBUG Adding transitive dependency for transformers==4.56.1: regex<2019.12.17 | >=2019.12.17+ 2025-09-07T10:21:29.6375844Z #43 0.540 DEBUG Adding transitive dependency for transformers==4.56.1: requests* 2025-09-07T10:21:29.6376603Z #43 0.540 DEBUG Adding transitive dependency for transformers==4.56.1: tokenizers>=0.22.0, <=0.23.0+ 2025-09-07T10:21:29.6377412Z #43 0.540 DEBUG Adding transitive dependency for transformers==4.56.1: safetensors>=0.4.3 2025-09-07T10:21:29.6378140Z #43 0.540 DEBUG Adding transitive dependency for transformers==4.56.1: tqdm>=4.27 2025-09-07T10:21:29.6378840Z #43 0.540 DEBUG Searching for a compatible version of tokenizers (>=0.22.0, <=0.23.0+) 2025-09-07T10:21:29.6379781Z #43 0.540 DEBUG Selecting: tokenizers==0.22.0 [compatible] (tokenizers-0.22.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6380664Z #43 0.540 DEBUG Found stale response for: https://pypi.org/simple/certifi/ 2025-09-07T10:21:29.6381351Z #43 0.540 DEBUG Sending revalidation request for: https://pypi.org/simple/certifi/ 2025-09-07T10:21:29.6382177Z #43 0.540 DEBUG Adding transitive dependency for tokenizers==0.22.0: huggingface-hub>=0.16.4, <1.0 2025-09-07T10:21:29.6383017Z #43 0.540 DEBUG Searching for a compatible version of protobuf (*) 2025-09-07T10:21:29.6383739Z #43 0.540 DEBUG Selecting: protobuf==6.32.0 [compatible] (protobuf-6.32.0-cp39-abi3-manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6384513Z #43 0.540 DEBUG Searching for a compatible version of fastapi[standard] (>=0.115.0) 2025-09-07T10:21:29.6385262Z #43 0.540 DEBUG Selecting: fastapi==0.116.1 [compatible] (fastapi-0.116.1-py3-none-any.whl) 2025-09-07T10:21:29.6385966Z #43 0.540 DEBUG Adding transitive dependency for fastapi==0.116.1: fastapi==0.116.1 2025-09-07T10:21:29.6386699Z #43 0.540 DEBUG Adding transitive dependency for fastapi==0.116.1: fastapi[standard]==0.116.1 2025-09-07T10:21:29.6387377Z #43 0.540 DEBUG Searching for a compatible version of fastapi (==0.116.1) 2025-09-07T10:21:29.6388040Z #43 0.540 DEBUG Selecting: fastapi==0.116.1 [compatible] (fastapi-0.116.1-py3-none-any.whl) 2025-09-07T10:21:29.6388734Z #43 0.541 DEBUG Found stale response for: https://pypi.org/simple/llvmlite/ 2025-09-07T10:21:29.6389396Z #43 0.541 DEBUG Sending revalidation request for: https://pypi.org/simple/llvmlite/ 2025-09-07T10:21:29.6390104Z #43 0.541 DEBUG Found stale response for: https://pypi.org/simple/huggingface-hub/ 2025-09-07T10:21:29.6390831Z #43 0.541 DEBUG Sending revalidation request for: https://pypi.org/simple/huggingface-hub/ 2025-09-07T10:21:29.6392215Z #43 0.541 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e5/47/d63c60f59a59467fda0f93f46335c9d18526d7071f025cb5b89d5353ea42/fastapi-0.116.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6393540Z #43 0.541 DEBUG Adding transitive dependency for fastapi==0.116.1: starlette>=0.40.0, <0.48.0 2025-09-07T10:21:29.6394495Z #43 0.541 DEBUG Adding transitive dependency for fastapi==0.116.1: pydantic>=1.7.4, <1.8 | >=1.8+, <1.8.1 | >=1.8.1+, <2.0.0 | >=2.0.0+, <2.0.1 | >=2.0.1+, <2.1.0 | >=2.1.0+, <3.0.0 2025-09-07T10:21:29.6395453Z #43 0.541 DEBUG Adding transitive dependency for fastapi==0.116.1: typing-extensions>=4.8.0 2025-09-07T10:21:29.6396167Z #43 0.541 DEBUG Searching for a compatible version of fastapi[standard] (==0.116.1) 2025-09-07T10:21:29.6396874Z #43 0.541 DEBUG Selecting: fastapi==0.116.1 [compatible] (fastapi-0.116.1-py3-none-any.whl) 2025-09-07T10:21:29.6397643Z #43 0.541 DEBUG Adding transitive dependency for fastapi==0.116.1: fastapi-cli[standard]>=0.0.8 2025-09-07T10:21:29.6398363Z #43 0.541 DEBUG Adding transitive dependency for fastapi==0.116.1: httpx>=0.23.0 2025-09-07T10:21:29.6399054Z #43 0.541 DEBUG Adding transitive dependency for fastapi==0.116.1: jinja2>=3.1.5 2025-09-07T10:21:29.6399792Z #43 0.541 DEBUG Adding transitive dependency for fastapi==0.116.1: python-multipart>=0.0.18 2025-09-07T10:21:29.6400551Z #43 0.541 DEBUG Adding transitive dependency for fastapi==0.116.1: email-validator>=2.0.0 2025-09-07T10:21:29.6401307Z #43 0.541 DEBUG Adding transitive dependency for fastapi==0.116.1: uvicorn[standard]>=0.12.0 2025-09-07T10:21:29.6401954Z #43 0.541 DEBUG Searching for a compatible version of aiohttp (*) 2025-09-07T10:21:29.6402762Z #43 0.541 DEBUG Selecting: aiohttp==3.12.15 [compatible] (aiohttp-3.12.15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6403671Z #43 0.541 DEBUG Adding transitive dependency for aiohttp==3.12.15: aiohappyeyeballs>=2.5.0 2025-09-07T10:21:29.6404399Z #43 0.541 DEBUG Adding transitive dependency for aiohttp==3.12.15: aiosignal>=1.4.0 2025-09-07T10:21:29.6405073Z #43 0.541 DEBUG Adding transitive dependency for aiohttp==3.12.15: attrs>=17.3.0 2025-09-07T10:21:29.6405743Z #43 0.541 DEBUG Adding transitive dependency for aiohttp==3.12.15: frozenlist>=1.1.1 2025-09-07T10:21:29.6406451Z #43 0.541 DEBUG Adding transitive dependency for aiohttp==3.12.15: multidict>=4.5, <7.0 2025-09-07T10:21:29.6407146Z #43 0.541 DEBUG Adding transitive dependency for aiohttp==3.12.15: propcache>=0.2.0 2025-09-07T10:21:29.6407870Z #43 0.541 DEBUG Adding transitive dependency for aiohttp==3.12.15: yarl>=1.17.0, <2.0 2025-09-07T10:21:29.6408539Z #43 0.541 DEBUG Found stale response for: https://pypi.org/simple/fastapi-cli/ 2025-09-07T10:21:29.6409260Z #43 0.541 DEBUG Sending revalidation request for: https://pypi.org/simple/fastapi-cli/ 2025-09-07T10:21:29.6409930Z #43 0.541 DEBUG Searching for a compatible version of openai (>=1.99.1) 2025-09-07T10:21:29.6410603Z #43 0.541 DEBUG Selecting: openai==1.106.1 [compatible] (openai-1.106.1-py3-none-any.whl) 2025-09-07T10:21:29.6411584Z #43 0.541 DEBUG Found stale response for: https://pypi.org/simple/jinja2/ 2025-09-07T10:21:29.6412252Z #43 0.541 DEBUG Sending revalidation request for: https://pypi.org/simple/jinja2/ 2025-09-07T10:21:29.6412966Z #43 0.541 DEBUG Adding transitive dependency for openai==1.106.1: anyio>=3.5.0, <5 2025-09-07T10:21:29.6413669Z #43 0.541 DEBUG Adding transitive dependency for openai==1.106.1: distro>=1.7.0, <2 2025-09-07T10:21:29.6414365Z #43 0.541 DEBUG Adding transitive dependency for openai==1.106.1: httpx>=0.23.0, <1 2025-09-07T10:21:29.6415069Z #43 0.541 DEBUG Adding transitive dependency for openai==1.106.1: jiter>=0.4.0, <1 2025-09-07T10:21:29.6415765Z #43 0.541 DEBUG Adding transitive dependency for openai==1.106.1: pydantic>=1.9.0, <3 2025-09-07T10:21:29.6416447Z #43 0.541 DEBUG Adding transitive dependency for openai==1.106.1: sniffio* 2025-09-07T10:21:29.6417090Z #43 0.541 DEBUG Adding transitive dependency for openai==1.106.1: tqdm>4 2025-09-07T10:21:29.6417804Z #43 0.541 DEBUG Adding transitive dependency for openai==1.106.1: typing-extensions>=4.11, <5 2025-09-07T10:21:29.6418551Z #43 0.541 DEBUG Searching for a compatible version of pydantic (>=2.11.7, <3.0.0) 2025-09-07T10:21:29.6419264Z #43 0.541 DEBUG Selecting: pydantic==2.11.7 [compatible] (pydantic-2.11.7-py3-none-any.whl) 2025-09-07T10:21:29.6420009Z #43 0.541 DEBUG Found stale response for: https://pypi.org/simple/email-validator/ 2025-09-07T10:21:29.6420764Z #43 0.541 DEBUG Sending revalidation request for: https://pypi.org/simple/email-validator/ 2025-09-07T10:21:29.6421555Z #43 0.541 DEBUG Adding transitive dependency for pydantic==2.11.7: annotated-types>=0.6.0 2025-09-07T10:21:29.6422487Z #43 0.541 DEBUG Adding transitive dependency for pydantic==2.11.7: pydantic-core>=2.33.2, <2.33.2+ 2025-09-07T10:21:29.6423277Z #43 0.541 DEBUG Adding transitive dependency for pydantic==2.11.7: typing-extensions>=4.12.2 2025-09-07T10:21:29.6424058Z #43 0.541 DEBUG Adding transitive dependency for pydantic==2.11.7: typing-inspection>=0.4.0 2025-09-07T10:21:29.6424742Z #43 0.541 DEBUG Found stale response for: https://pypi.org/simple/httpx/ 2025-09-07T10:21:29.6425421Z #43 0.541 DEBUG Sending revalidation request for: https://pypi.org/simple/httpx/ 2025-09-07T10:21:29.6426116Z #43 0.541 DEBUG Found stale response for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T10:21:29.6426879Z #43 0.541 DEBUG Sending revalidation request for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T10:21:29.6427633Z #43 0.541 DEBUG Found stale response for: https://pypi.org/simple/python-multipart/ 2025-09-07T10:21:29.6428370Z #43 0.541 DEBUG Sending revalidation request for: https://pypi.org/simple/python-multipart/ 2025-09-07T10:21:29.6429084Z #43 0.541 DEBUG Found stale response for: https://pypi.org/simple/starlette/ 2025-09-07T10:21:29.6429756Z #43 0.541 DEBUG Sending revalidation request for: https://pypi.org/simple/starlette/ 2025-09-07T10:21:29.6430434Z #43 0.541 DEBUG Found stale response for: https://pypi.org/simple/uvicorn/ 2025-09-07T10:21:29.6431095Z #43 0.541 DEBUG Sending revalidation request for: https://pypi.org/simple/uvicorn/ 2025-09-07T10:21:29.6432101Z #43 0.542 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.11, <5 2025-09-07T10:21:29.6433430Z #43 0.542 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.12.2, <5 2025-09-07T10:21:29.6434501Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/aiohappyeyeballs/ 2025-09-07T10:21:29.6435252Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/aiohappyeyeballs/ 2025-09-07T10:21:29.6435978Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/aiosignal/ 2025-09-07T10:21:29.6436649Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/aiosignal/ 2025-09-07T10:21:29.6437355Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/attrs/ 2025-09-07T10:21:29.6437990Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/attrs/ 2025-09-07T10:21:29.6438656Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/propcache/ 2025-09-07T10:21:29.6439328Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/propcache/ 2025-09-07T10:21:29.6439995Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/anyio/ 2025-09-07T10:21:29.6440634Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/anyio/ 2025-09-07T10:21:29.6441267Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/distro/ 2025-09-07T10:21:29.6441920Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/distro/ 2025-09-07T10:21:29.6442564Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/sniffio/ 2025-09-07T10:21:29.6443255Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/sniffio/ 2025-09-07T10:21:29.6443957Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/annotated-types/ 2025-09-07T10:21:29.6444686Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/annotated-types/ 2025-09-07T10:21:29.6445438Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/typing-inspection/ 2025-09-07T10:21:29.6446183Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/typing-inspection/ 2025-09-07T10:21:29.6446913Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/safetensors/ 2025-09-07T10:21:29.6447602Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/safetensors/ 2025-09-07T10:21:29.6448299Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/frozenlist/ 2025-09-07T10:21:29.6449180Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/frozenlist/ 2025-09-07T10:21:29.6450283Z #43 0.542 DEBUG Found stale response for: https://pypi.org/simple/jiter/ 2025-09-07T10:21:29.6451297Z #43 0.542 DEBUG Sending revalidation request for: https://pypi.org/simple/jiter/ 2025-09-07T10:21:29.6452267Z #43 0.543 DEBUG Found stale response for: https://pypi.org/simple/yarl/ 2025-09-07T10:21:29.6453137Z #43 0.543 DEBUG Sending revalidation request for: https://pypi.org/simple/yarl/ 2025-09-07T10:21:29.6453822Z #43 0.547 DEBUG Found stale response for: https://pypi.org/simple/multidict/ 2025-09-07T10:21:29.6454515Z #43 0.547 DEBUG Sending revalidation request for: https://pypi.org/simple/multidict/ 2025-09-07T10:21:29.6455270Z #43 0.547 DEBUG Found not-modified response for: https://pypi.org/simple/packaging/ 2025-09-07T10:21:29.6455945Z #43 0.548 DEBUG Found installed version of packaging==25.0 that satisfies * 2025-09-07T10:21:29.6456601Z #43 0.548 DEBUG Found installed version of packaging==25.0 that satisfies >=20.0 2025-09-07T10:21:29.6457276Z #43 0.550 DEBUG Found not-modified response for: https://pypi.org/simple/triton/ 2025-09-07T10:21:29.6457976Z #43 0.550 DEBUG Found not-modified response for: https://pypi.org/simple/astor/ 2025-09-07T10:21:29.6458656Z #43 0.550 DEBUG Found not-modified response for: https://pypi.org/simple/dill/ 2025-09-07T10:21:29.6459354Z #43 0.550 DEBUG Found not-modified response for: https://pypi.org/simple/interegular/ 2025-09-07T10:21:29.6460071Z #43 0.550 DEBUG Found not-modified response for: https://pypi.org/simple/torch/ 2025-09-07T10:21:29.6460775Z #43 0.550 DEBUG Found not-modified response for: https://pypi.org/simple/frozendict/ 2025-09-07T10:21:29.6461574Z #43 0.550 DEBUG Found not-modified response for: https://pypi.org/simple/urllib3/ 2025-09-07T10:21:29.6462724Z #43 0.551 DEBUG Found not-modified response for: https://pypi.org/simple/certifi/ 2025-09-07T10:21:29.6463544Z #43 0.551 DEBUG Found not-modified response for: https://pypi.org/simple/llvmlite/ 2025-09-07T10:21:29.6464326Z #43 0.551 DEBUG Found not-modified response for: https://pypi.org/simple/idna/ 2025-09-07T10:21:29.6465202Z #43 0.552 DEBUG Found not-modified response for: https://pypi.org/simple/huggingface-hub/ 2025-09-07T10:21:29.6465954Z #43 0.552 DEBUG Found not-modified response for: https://pypi.org/simple/fastapi-cli/ 2025-09-07T10:21:29.6466674Z #43 0.552 DEBUG Found not-modified response for: https://pypi.org/simple/httpx/ 2025-09-07T10:21:29.6467576Z #43 0.552 DEBUG Found not-modified response for: https://pypi.org/simple/email-validator/ 2025-09-07T10:21:29.6468434Z #43 0.552 DEBUG Found not-modified response for: https://pypi.org/simple/jinja2/ 2025-09-07T10:21:29.6469376Z #43 0.562 DEBUG Found not-modified response for: https://pypi.org/simple/frozenlist/ 2025-09-07T10:21:29.6470137Z #43 0.563 DEBUG Found not-modified response for: https://pypi.org/simple/yarl/ 2025-09-07T10:21:29.6470784Z #43 0.565 DEBUG Found not-modified response for: https://pypi.org/simple/jiter/ 2025-09-07T10:21:29.6471476Z #43 0.565 DEBUG Found not-modified response for: https://pypi.org/simple/safetensors/ 2025-09-07T10:21:29.6472278Z #43 0.566 DEBUG Found not-modified response for: https://pypi.org/simple/uvicorn/ 2025-09-07T10:21:29.6472942Z #43 0.566 DEBUG Found not-modified response for: https://pypi.org/simple/anyio/ 2025-09-07T10:21:29.6473638Z #43 0.566 DEBUG Found not-modified response for: https://pypi.org/simple/aiosignal/ 2025-09-07T10:21:29.6474371Z #43 0.566 DEBUG Found not-modified response for: https://pypi.org/simple/aiohappyeyeballs/ 2025-09-07T10:21:29.6475122Z #43 0.566 DEBUG Found not-modified response for: https://pypi.org/simple/propcache/ 2025-09-07T10:21:29.6475826Z #43 0.566 DEBUG Found not-modified response for: https://pypi.org/simple/starlette/ 2025-09-07T10:21:29.6476576Z #43 0.566 DEBUG Found not-modified response for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T10:21:29.6477453Z #43 0.567 DEBUG Found not-modified response for: https://pypi.org/simple/annotated-types/ 2025-09-07T10:21:29.6478254Z #43 0.567 DEBUG Found not-modified response for: https://pypi.org/simple/distro/ 2025-09-07T10:21:29.6479020Z #43 0.567 DEBUG Found not-modified response for: https://pypi.org/simple/attrs/ 2025-09-07T10:21:29.6479869Z #43 0.567 DEBUG Found not-modified response for: https://pypi.org/simple/typing-inspection/ 2025-09-07T10:21:29.6480764Z #43 0.567 DEBUG Found not-modified response for: https://pypi.org/simple/sniffio/ 2025-09-07T10:21:29.6481613Z #43 0.567 DEBUG Found not-modified response for: https://pypi.org/simple/python-multipart/ 2025-09-07T10:21:29.6482343Z #43 0.567 DEBUG Found not-modified response for: https://pypi.org/simple/multidict/ 2025-09-07T10:21:29.6483044Z #43 0.569 DEBUG Found stale response for: https://pypi.org/simple/pydantic-core/ 2025-09-07T10:21:29.6483749Z #43 0.569 DEBUG Sending revalidation request for: https://pypi.org/simple/pydantic-core/ 2025-09-07T10:21:29.6484931Z #43 0.570 DEBUG Found installed version of torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=1.10.0 2025-09-07T10:21:29.6486245Z #43 0.574 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies >=3.1.5 2025-09-07T10:21:29.6487725Z #43 0.574 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c3/88/97eef84f48fa04fbd6750e62dcceafba6c63c81b7ac1420856c8dcc0a3f9/astor-0.8.1-py2.py3-none-any.whl.metadata 2025-09-07T10:21:29.6489604Z #43 0.574 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/50/3d/9373ad9c56321fdab5b41197068e1d8c25883b3fea29dd361f9b55116869/dill-0.4.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6491794Z #43 0.574 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c4/01/72d6472f80651673716d1deda2a5bbb633e563ecf94f4479da5519d69d25/interegular-0.3.3-py37-none-any.whl.metadata 2025-09-07T10:21:29.6493797Z #43 0.574 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/ba/d0/d482c39cee2ab2978a892558cf130681d4574ea208e162da8958b31e9250/frozendict-2.4.6-py312-none-any.whl.metadata 2025-09-07T10:21:29.6495799Z #43 0.574 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/a7/c2/fe1e52489ae3122415c51f387e221dd0773709bad6c6cdaa599e8a2c5185/urllib3-2.5.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6497739Z #43 0.575 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e5/48/1549795ba7742c948d2ad169c1c8cdbae65bc450d6cd753d124b17c8cd32/certifi-2025.8.3-py3-none-any.whl.metadata 2025-09-07T10:21:29.6499898Z #43 0.575 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/cb/da/8341fd3056419441286c8e26bf436923021005ece0bff5f41906476ae514/llvmlite-0.44.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6502040Z #43 0.575 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/76/c6/c88e154df9c4e1a2a66ccf0005a88dfb2650c1dffb6f5ce603dfbd452ce3/idna-3.10-py3-none-any.whl.metadata 2025-09-07T10:21:29.6504081Z #43 0.575 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/39/7b/bb06b061991107cd8783f300adff3e7b7f284e330fd82f507f2a1417b11d/huggingface_hub-0.34.4-py3-none-any.whl.metadata 2025-09-07T10:21:29.6506042Z #43 0.575 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/2a/39/e50c7c3a983047577ee07d2a9e53faf5a69493943ec3f6a384bdc792deb2/httpx-0.28.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6508254Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/de/15/545e2b6cf2e3be84bc1ed85613edd75b8aea69807a71c26f4ca6a9258e82/email_validator-2.3.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6510747Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/8d/db/48421f62a6f77c553575201e89048e97198046b793f4a089c79a6e3268bd/frozenlist-1.7.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6514023Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/98/28/3ab7acc5b51f4434b181b0cee8f1f4b77a65919700a355fb3617f9488874/yarl-1.20.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6517009Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e2/ba/77013b0b8ba904bf3762f11e0129b8928bff7f978a81838dfcc958ad5728/jiter-0.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6519714Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/fe/5d/5a514d7b88e310c8b146e2404e0dc161282e78634d9358975fd56dfd14be/safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6522704Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/6f/12/e5e0282d673bb9746bacfb6e2dba8719989d3660cdb2ea79aee9a9651afb/anyio-4.10.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6525220Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/fb/76/641ae371508676492379f16e2fa48f4e2c11741bd63c48be4b12a6b09cba/aiosignal-1.4.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6527947Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/0f/15/5bf3b99495fb160b63f95972b81750f18f7f4e02ad051373b669d17d44f2/aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6530155Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/37/7c/54fd5301ef38505ab235d98827207176a5c9b2aa61939b10a460ca53e123/propcache-0.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6532637Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/ce/fd/901cfa59aaa5b30a99e16876f11abe38b59a1a2c51ffb3d7142bb6089069/starlette-0.47.3-py3-none-any.whl.metadata 2025-09-07T10:21:29.6536186Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/16/ab/0233c3231af734f5dfcf0844aa9582d5a1466c985bbed6cedab85af9bfe3/charset_normalizer-3.4.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T10:21:29.6539597Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6542391Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/12/b3/231ffd4ab1fc9d679809f356cebee130ac7daa00d6d6f3206dd4fd137e9e/distro-1.9.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6544411Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/77/06/bb80f5f86020c4551da315d78b3ab75e8228f89f0162f2c3a819e407941a/attrs-25.3.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.6546317Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/17/69/cd203477f944c353c31bade965f880aa1061fd6bf05ded0726ca845b6ff7/typing_inspection-0.4.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6548244Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6550605Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/45/58/38b5afbc1a800eeea951b9285d3912613f2603bdf897a4ab0f4bd7f405fc/python_multipart-0.0.20-py3-none-any.whl.metadata 2025-09-07T10:21:29.6552920Z #43 0.576 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/af/65/753a2d8b05daf496f4a9c367fe844e90a1b2cac78e2be2c844200d10cc4c/multidict-6.6.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T10:21:29.6554611Z #43 0.576 DEBUG Found not-modified response for: https://pypi.org/simple/pydantic-core/ 2025-09-07T10:21:29.6555375Z #43 0.588 DEBUG Searching for a compatible version of pydantic-core (>=2.33.2, <2.33.2+) 2025-09-07T10:21:29.6556350Z #43 0.588 DEBUG Selecting: pydantic-core==2.33.2 [compatible] (pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6558172Z #43 0.588 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/f9/41/4b043778cf9c4285d59742281a769eac371b9e47e35f98ad321349cc5d61/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6559934Z #43 0.588 DEBUG Adding transitive dependency for pydantic-core==2.33.2: typing-extensions>=4.6.0, <4.7.0 | >=4.7.0+ 2025-09-07T10:21:29.6560795Z #43 0.588 DEBUG Searching for a compatible version of prometheus-client (>=0.18.0) 2025-09-07T10:21:29.6561738Z #43 0.588 DEBUG Selecting: prometheus-client==0.22.1 [compatible] (prometheus_client-0.22.1-py3-none-any.whl) 2025-09-07T10:21:29.6562460Z #43 0.588 DEBUG Searching for a compatible version of pillow (*) 2025-09-07T10:21:29.6563423Z #43 0.588 DEBUG Found installed version of pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies * 2025-09-07T10:21:29.6564361Z #43 0.588 DEBUG Selecting: pillow==11.3.0 [installed] (installed) 2025-09-07T10:21:29.6565251Z #43 0.588 DEBUG Searching for a compatible version of prometheus-fastapi-instrumentator (>=7.0.0) 2025-09-07T10:21:29.6566715Z #43 0.588 DEBUG Selecting: prometheus-fastapi-instrumentator==7.1.0 [compatible] (prometheus_fastapi_instrumentator-7.1.0-py3-none-any.whl) 2025-09-07T10:21:29.6568313Z #43 0.588 DEBUG Adding transitive dependency for prometheus-fastapi-instrumentator==7.1.0: prometheus-client>=0.8.0, <1.0.0 2025-09-07T10:21:29.6569400Z #43 0.588 DEBUG Adding transitive dependency for prometheus-fastapi-instrumentator==7.1.0: starlette>=0.30.0, <1.0.0 2025-09-07T10:21:29.6570294Z #43 0.588 DEBUG Searching for a compatible version of tiktoken (>=0.6.0) 2025-09-07T10:21:29.6571437Z #43 0.588 DEBUG Selecting: tiktoken==0.11.0 [compatible] (tiktoken-0.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6572357Z #43 0.588 DEBUG Adding transitive dependency for tiktoken==0.11.0: regex>=2022.1.18 2025-09-07T10:21:29.6573065Z #43 0.588 DEBUG Adding transitive dependency for tiktoken==0.11.0: requests>=2.26.0 2025-09-07T10:21:29.6574235Z #43 0.588 DEBUG Searching for a compatible version of llguidance{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (>=0.7.11, <0.8.0) 2025-09-07T10:21:29.6575564Z #43 0.588 DEBUG Selecting: llguidance==0.7.30 [compatible] (llguidance-0.7.30-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6576509Z #43 0.588 DEBUG Adding transitive dependency for llguidance==0.7.30: llguidance==0.7.30 2025-09-07T10:21:29.6577673Z #43 0.588 DEBUG Adding transitive dependency for llguidance==0.7.30: llguidance{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}==0.7.30 2025-09-07T10:21:29.6578754Z #43 0.588 DEBUG Searching for a compatible version of llguidance (==0.7.30) 2025-09-07T10:21:29.6579648Z #43 0.588 DEBUG Selecting: llguidance==0.7.30 [compatible] (llguidance-0.7.30-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6581469Z #43 0.588 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/af/80/5a40b9689f17612434b820854cba9b8cabd5142072c491b5280fe5f7a35e/llguidance-0.7.30-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6583472Z #43 0.588 DEBUG Searching for a compatible version of llguidance{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (==0.7.30) 2025-09-07T10:21:29.6584724Z #43 0.588 DEBUG Selecting: llguidance==0.7.30 [compatible] (llguidance-0.7.30-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6585633Z #43 0.588 DEBUG Searching for a compatible version of typing-extensions (>=4.12.2, <5) 2025-09-07T10:21:29.6586646Z #43 0.588 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.12.2, <5 2025-09-07T10:21:29.6587622Z #43 0.588 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T10:21:29.6588232Z #43 0.588 DEBUG Searching for a compatible version of filelock (>=3.16.1) 2025-09-07T10:21:29.6589128Z #43 0.588 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies >=3.16.1 2025-09-07T10:21:29.6589959Z #43 0.589 DEBUG Selecting: filelock==3.19.1 [installed] (installed) 2025-09-07T10:21:29.6590539Z #43 0.589 DEBUG Searching for a compatible version of partial-json-parser (*) 2025-09-07T10:21:29.6591410Z #43 0.589 DEBUG Selecting: partial-json-parser==0.2.1.1.post6 [compatible] (partial_json_parser-0.2.1.1.post6-py3-none-any.whl) 2025-09-07T10:21:29.6592229Z #43 0.589 DEBUG Searching for a compatible version of pyzmq (>=25.0.0) 2025-09-07T10:21:29.6593025Z #43 0.589 DEBUG Selecting: pyzmq==27.0.2 [compatible] (pyzmq-27.0.2-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.6593794Z #43 0.589 DEBUG Searching for a compatible version of msgspec (*) 2025-09-07T10:21:29.6594601Z #43 0.589 DEBUG Selecting: msgspec==0.19.0 [compatible] (msgspec-0.19.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6595413Z #43 0.589 DEBUG Searching for a compatible version of gguf (>=0.13.0) 2025-09-07T10:21:29.6596014Z #43 0.589 DEBUG Selecting: gguf==0.17.1 [compatible] (gguf-0.17.1-py3-none-any.whl) 2025-09-07T10:21:29.6596651Z #43 0.589 DEBUG Adding transitive dependency for gguf==0.17.1: numpy>=1.17 2025-09-07T10:21:29.6597242Z #43 0.589 DEBUG Adding transitive dependency for gguf==0.17.1: pyyaml>=5.1 2025-09-07T10:21:29.6597872Z #43 0.589 DEBUG Adding transitive dependency for gguf==0.17.1: tqdm>=4.27 2025-09-07T10:21:29.6598520Z #43 0.589 DEBUG Searching for a compatible version of mistral-common[audio] (>=1.8.2) 2025-09-07T10:21:29.6599270Z #43 0.589 DEBUG Selecting: mistral-common==1.8.4 [compatible] (mistral_common-1.8.4-py3-none-any.whl) 2025-09-07T10:21:29.6600085Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: mistral-common==1.8.4 2025-09-07T10:21:29.6600926Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: mistral-common[audio]==1.8.4 2025-09-07T10:21:29.6601669Z #43 0.589 DEBUG Searching for a compatible version of mistral-common (==1.8.4) 2025-09-07T10:21:29.6602405Z #43 0.589 DEBUG Selecting: mistral-common==1.8.4 [compatible] (mistral_common-1.8.4-py3-none-any.whl) 2025-09-07T10:21:29.6603786Z #43 0.589 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d6/4f/756a66c608a767c7af7010b23992343e97558ce7f86c5c15929f1215f6ef/mistral_common-1.8.4-py3-none-any.whl.metadata 2025-09-07T10:21:29.6605142Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: pydantic>=2.7, <3.0 2025-09-07T10:21:29.6605898Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: jsonschema>=4.21.1 2025-09-07T10:21:29.6606695Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: typing-extensions>=4.11.0 2025-09-07T10:21:29.6607522Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: tiktoken>=0.7.0 2025-09-07T10:21:29.6608236Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: pillow>=10.3.0 2025-09-07T10:21:29.6608963Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: requests>=2.0.0 2025-09-07T10:21:29.6609663Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: numpy>=1.25 2025-09-07T10:21:29.6610500Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: pydantic-extra-types[pycountry]>=2.10.5 2025-09-07T10:21:29.6611627Z #43 0.589 DEBUG Searching for a compatible version of mistral-common[audio] (==1.8.4) 2025-09-07T10:21:29.6612410Z #43 0.589 DEBUG Selecting: mistral-common==1.8.4 [compatible] (mistral_common-1.8.4-py3-none-any.whl) 2025-09-07T10:21:29.6613220Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: soundfile>=0.12.1 2025-09-07T10:21:29.6613954Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: soxr>=0.5.0 2025-09-07T10:21:29.6614676Z #43 0.589 DEBUG Searching for a compatible version of mistral-common[image] (>=1.8.2) 2025-09-07T10:21:29.6615489Z #43 0.589 DEBUG Selecting: mistral-common==1.8.4 [compatible] (mistral_common-1.8.4-py3-none-any.whl) 2025-09-07T10:21:29.6616330Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: mistral-common==1.8.4 2025-09-07T10:21:29.6617189Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: mistral-common[image]==1.8.4 2025-09-07T10:21:29.6617974Z #43 0.589 DEBUG Searching for a compatible version of mistral-common[image] (==1.8.4) 2025-09-07T10:21:29.6618770Z #43 0.589 DEBUG Selecting: mistral-common==1.8.4 [compatible] (mistral_common-1.8.4-py3-none-any.whl) 2025-09-07T10:21:29.6619637Z #43 0.589 DEBUG Adding transitive dependency for mistral-common==1.8.4: opencv-python-headless>=4.0.0 2025-09-07T10:21:29.6620466Z #43 0.589 DEBUG Searching for a compatible version of opencv-python-headless (>=4.11.0) 2025-09-07T10:21:29.6621235Z #43 0.589 DEBUG Found stale response for: https://pypi.org/simple/pydantic-extra-types/ 2025-09-07T10:21:29.6622342Z #43 0.589 DEBUG Selecting: opencv-python-headless==4.12.0.88 [compatible] (opencv_python_headless-4.12.0.88-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.6623614Z #43 0.589 DEBUG Sending revalidation request for: https://pypi.org/simple/pydantic-extra-types/ 2025-09-07T10:21:29.6624577Z #43 0.589 DEBUG Adding transitive dependency for opencv-python-headless==4.12.0.88: numpy{python_full_version >= '3.9'}>=2, <2.3.0 2025-09-07T10:21:29.6625583Z #43 0.589 DEBUG Searching for a compatible version of numpy{python_full_version >= '3.9'} (>=2, <2.3.0) 2025-09-07T10:21:29.6626330Z #43 0.589 DEBUG Found stale response for: https://pypi.org/simple/soundfile/ 2025-09-07T10:21:29.6627147Z #43 0.589 DEBUG Selecting: numpy==2.2.6 [compatible] (numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6628024Z #43 0.589 DEBUG Sending revalidation request for: https://pypi.org/simple/soundfile/ 2025-09-07T10:21:29.6628725Z #43 0.589 DEBUG Adding transitive dependency for numpy==2.2.6: numpy==2.2.6 2025-09-07T10:21:29.6629476Z #43 0.589 DEBUG Adding transitive dependency for numpy==2.2.6: numpy{python_full_version >= '3.9'}==2.2.6 2025-09-07T10:21:29.6630316Z #43 0.589 DEBUG Searching for a compatible version of numpy{python_full_version >= '3.9'} (==2.2.6) 2025-09-07T10:21:29.6631214Z #43 0.589 DEBUG Selecting: numpy==2.2.6 [compatible] (numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6632010Z #43 0.589 DEBUG Searching for a compatible version of pyyaml (>=5.1) 2025-09-07T10:21:29.6632796Z #43 0.589 DEBUG Selecting: pyyaml==6.0.2 [compatible] (PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6633713Z #43 0.589 DEBUG Searching for a compatible version of six{python_full_version >= '3.12'} (>=1.16.0) 2025-09-07T10:21:29.6634475Z #43 0.589 DEBUG Found stale response for: https://pypi.org/simple/jsonschema/ 2025-09-07T10:21:29.6635165Z #43 0.589 DEBUG Sending revalidation request for: https://pypi.org/simple/jsonschema/ 2025-09-07T10:21:29.6635875Z #43 0.589 DEBUG Selecting: six==1.17.0 [compatible] (six-1.17.0-py2.py3-none-any.whl) 2025-09-07T10:21:29.6636499Z #43 0.589 DEBUG Adding transitive dependency for six==1.17.0: six==1.17.0 2025-09-07T10:21:29.6637213Z #43 0.589 DEBUG Adding transitive dependency for six==1.17.0: six{python_full_version >= '3.12'}==1.17.0 2025-09-07T10:21:29.6637895Z #43 0.589 DEBUG Searching for a compatible version of six (==1.17.0) 2025-09-07T10:21:29.6638504Z #43 0.589 DEBUG Selecting: six==1.17.0 [compatible] (six-1.17.0-py2.py3-none-any.whl) 2025-09-07T10:21:29.6639798Z #43 0.589 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl.metadata 2025-09-07T10:21:29.6641134Z #43 0.589 DEBUG Searching for a compatible version of six{python_full_version >= '3.12'} (==1.17.0) 2025-09-07T10:21:29.6641873Z #43 0.589 DEBUG Selecting: six==1.17.0 [compatible] (six-1.17.0-py2.py3-none-any.whl) 2025-09-07T10:21:29.6642693Z #43 0.589 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (>=77.0.3, <80) 2025-09-07T10:21:29.6643745Z #43 0.589 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies >=77.0.3, <80 2025-09-07T10:21:29.6644626Z #43 0.589 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T10:21:29.6645259Z #43 0.589 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools==78.1.0 2025-09-07T10:21:29.6646114Z #43 0.589 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools{python_full_version >= '3.12'}==78.1.0 2025-09-07T10:21:29.6646911Z #43 0.589 DEBUG Searching for a compatible version of setuptools (==78.1.0) 2025-09-07T10:21:29.6647799Z #43 0.589 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T10:21:29.6648668Z #43 0.589 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T10:21:29.6649897Z #43 0.589 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T10:21:29.6651023Z #43 0.590 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (==78.1.0) 2025-09-07T10:21:29.6652145Z #43 0.590 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T10:21:29.6653026Z #43 0.590 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T10:21:29.6653596Z #43 0.590 DEBUG Searching for a compatible version of einops (*) 2025-09-07T10:21:29.6654212Z #43 0.590 DEBUG Selecting: einops==0.8.1 [compatible] (einops-0.8.1-py3-none-any.whl) 2025-09-07T10:21:29.6654868Z #43 0.590 DEBUG Searching for a compatible version of cloudpickle (*) 2025-09-07T10:21:29.6655628Z #43 0.590 DEBUG Selecting: cloudpickle==3.1.1 [compatible] (cloudpickle-3.1.1-py3-none-any.whl) 2025-09-07T10:21:29.6656343Z #43 0.590 DEBUG Searching for a compatible version of watchfiles (*) 2025-09-07T10:21:29.6657217Z #43 0.590 DEBUG Selecting: watchfiles==1.1.0 [compatible] (watchfiles-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6658099Z #43 0.590 DEBUG Found stale response for: https://pypi.org/simple/soxr/ 2025-09-07T10:21:29.6658764Z #43 0.590 DEBUG Sending revalidation request for: https://pypi.org/simple/soxr/ 2025-09-07T10:21:29.6659444Z #43 0.590 DEBUG Adding transitive dependency for watchfiles==1.1.0: anyio>=3.0.0 2025-09-07T10:21:29.6660115Z #43 0.590 DEBUG Searching for a compatible version of python-json-logger (*) 2025-09-07T10:21:29.6660910Z #43 0.590 DEBUG Selecting: python-json-logger==3.3.0 [compatible] (python_json_logger-3.3.0-py3-none-any.whl) 2025-09-07T10:21:29.6661691Z #43 0.590 DEBUG Searching for a compatible version of scipy (*) 2025-09-07T10:21:29.6662592Z #43 0.590 DEBUG Selecting: scipy==1.16.1 [compatible] (scipy-1.16.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.6663434Z #43 0.590 DEBUG Adding transitive dependency for scipy==1.16.1: numpy>=1.25.2, <2.6 2025-09-07T10:21:29.6664046Z #43 0.590 DEBUG Searching for a compatible version of ninja (*) 2025-09-07T10:21:29.6664788Z #43 0.590 DEBUG Selecting: ninja==1.13.0 [compatible] (ninja-1.13.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.6665560Z #43 0.590 DEBUG Searching for a compatible version of pybase64 (*) 2025-09-07T10:21:29.6666549Z #43 0.590 DEBUG Selecting: pybase64==1.4.2 [compatible] (pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl) 2025-09-07T10:21:29.6667508Z #43 0.590 DEBUG Searching for a compatible version of cbor2 (*) 2025-09-07T10:21:29.6668376Z #43 0.590 DEBUG Selecting: cbor2==5.7.0 [compatible] (cbor2-5.7.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.6669262Z #43 0.590 DEBUG Searching for a compatible version of setproctitle (*) 2025-09-07T10:21:29.6670331Z #43 0.590 DEBUG Selecting: setproctitle==1.3.7 [compatible] (setproctitle-1.3.7-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl) 2025-09-07T10:21:29.6671330Z #43 0.590 DEBUG Searching for a compatible version of openai-harmony (>=0.0.3) 2025-09-07T10:21:29.6672232Z #43 0.590 DEBUG Selecting: openai-harmony==0.0.4 [compatible] (openai_harmony-0.0.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6673186Z #43 0.590 DEBUG Adding transitive dependency for openai-harmony==0.0.4: pydantic>=2.11.7 2025-09-07T10:21:29.6673923Z #43 0.590 DEBUG Found not-modified response for: https://pypi.org/simple/soundfile/ 2025-09-07T10:21:29.6674581Z #43 0.590 DEBUG Searching for a compatible version of ray[cgraph] (>=2.48.0) 2025-09-07T10:21:29.6675293Z #43 0.590 DEBUG Selecting: ray==2.49.1 [compatible] (ray-2.49.1-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6675978Z #43 0.590 DEBUG Adding transitive dependency for ray==2.49.1: ray==2.49.1 2025-09-07T10:21:29.6676614Z #43 0.590 DEBUG Adding transitive dependency for ray==2.49.1: ray[cgraph]==2.49.1 2025-09-07T10:21:29.6677214Z #43 0.590 DEBUG Searching for a compatible version of ray (==2.49.1) 2025-09-07T10:21:29.6677883Z #43 0.590 DEBUG Selecting: ray==2.49.1 [compatible] (ray-2.49.1-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6678669Z #43 0.590 DEBUG Found not-modified response for: https://pypi.org/simple/jsonschema/ 2025-09-07T10:21:29.6679430Z #43 0.591 DEBUG Found not-modified response for: https://pypi.org/simple/pydantic-extra-types/ 2025-09-07T10:21:29.6680823Z #43 0.591 DEBUG No cache entry for: https://files.pythonhosted.org/packages/00/02/c81260c0f94bd34a1442ea488bdd433dfc9e6ed6211c9a59bc4157b8e00e/ray-2.49.1-cp312-cp312-manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6682881Z #43 0.591 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/57/5e/70bdd9579b35003a489fc850b5047beeda26328053ebadc1fb60f320f7db/soundfile-0.13.1-py2.py3-none-manylinux_2_28_x86_64.whl.metadata 2025-09-07T10:21:29.6684269Z #43 0.591 DEBUG Found not-modified response for: https://pypi.org/simple/soxr/ 2025-09-07T10:21:29.6685568Z #43 0.591 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/bf/9c/8c95d856233c1f82500c2450b8c68576b4cf1c871db3afac5c34ff84e6fd/jsonschema-4.25.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.6687660Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e1/1a/569ea0420a0c4801c2c8dd40d8d544989522f6014d51def689125f3f2935/soxr-0.5.0.post1-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6689096Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: click>=7.0 2025-09-07T10:21:29.6689715Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: filelock* 2025-09-07T10:21:29.6690293Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: jsonschema* 2025-09-07T10:21:29.6691019Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: msgpack>=1.0.0, <2.0.0 2025-09-07T10:21:29.6691889Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: packaging* 2025-09-07T10:21:29.6692521Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: protobuf>=3.20.3 2025-09-07T10:21:29.6693141Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: pyyaml* 2025-09-07T10:21:29.6693725Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: requests* 2025-09-07T10:21:29.6694342Z #43 0.616 DEBUG Searching for a compatible version of ray[cgraph] (==2.49.1) 2025-09-07T10:21:29.6695056Z #43 0.616 DEBUG Selecting: ray==2.49.1 [compatible] (ray-2.49.1-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T10:21:29.6695895Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: cupy-cuda12x{sys_platform != 'darwin'}* 2025-09-07T10:21:29.6696635Z #43 0.616 DEBUG Searching for a compatible version of interegular (>=0.3.2) 2025-09-07T10:21:29.6697367Z #43 0.616 DEBUG Selecting: interegular==0.3.3 [compatible] (interegular-0.3.3-py37-none-any.whl) 2025-09-07T10:21:29.6698123Z #43 0.616 DEBUG Searching for a compatible version of packaging (>=20.0) 2025-09-07T10:21:29.6698759Z #43 0.616 DEBUG Found installed version of packaging==25.0 that satisfies >=20.0 2025-09-07T10:21:29.6699379Z #43 0.616 DEBUG Selecting: packaging==25.0 [installed] (installed) 2025-09-07T10:21:29.6699942Z #43 0.616 DEBUG Searching for a compatible version of torch (>=1.10.0) 2025-09-07T10:21:29.6700526Z #43 0.616 DEBUG No cache entry for: https://pypi.org/simple/msgpack/ 2025-09-07T10:21:29.6701633Z #43 0.616 DEBUG Found installed version of torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=1.10.0 2025-09-07T10:21:29.6702761Z #43 0.616 DEBUG Selecting: torch==2.9.0.dev20250901+cu129 [installed] (installed) 2025-09-07T10:21:29.6703508Z #43 0.616 DEBUG No cache entry for: https://pypi.org/simple/cupy-cuda12x/ 2025-09-07T10:21:29.6704169Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: filelock* 2025-09-07T10:21:29.6704982Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: typing-extensions>=4.10.0 2025-09-07T10:21:29.6705926Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: setuptools{python_full_version >= '3.12'}* 2025-09-07T10:21:29.6706860Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: sympy>=1.13.3 2025-09-07T10:21:29.6707644Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: networkx>=2.5.1 2025-09-07T10:21:29.6708387Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: jinja2* 2025-09-07T10:21:29.6709136Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: fsspec>=0.8.5 2025-09-07T10:21:29.6710314Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.86, <12.9.86+ 2025-09-07T10:21:29.6711837Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T10:21:29.6713349Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T10:21:29.6714817Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=9.10.2.21, <9.10.2.21+ 2025-09-07T10:21:29.6716290Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.1.4, <12.9.1.4+ 2025-09-07T10:21:29.6717372Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/click/ 2025-09-07T10:21:29.6718392Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.4.1.4, <11.4.1.4+ 2025-09-07T10:21:29.6719872Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=10.3.10.19, <10.3.10.19+ 2025-09-07T10:21:29.6720968Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/click/ 2025-09-07T10:21:29.6722059Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.7.5.82, <11.7.5.82+ 2025-09-07T10:21:29.6723583Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.5.10.65, <12.5.10.65+ 2025-09-07T10:21:29.6725121Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=0.7.1, <0.7.1+ 2025-09-07T10:21:29.6726564Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=2.27.5, <2.27.5+ 2025-09-07T10:21:29.6728005Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=3.3.20, <3.3.20+ 2025-09-07T10:21:29.6729430Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T10:21:29.6730965Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.86, <12.9.86+ 2025-09-07T10:21:29.6732675Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=1.14.1.1, <1.14.1.1+ 2025-09-07T10:21:29.6734016Z #43 0.617 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: pytorch-triton{sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T10:21:29.6734928Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/sympy/ 2025-09-07T10:21:29.6735635Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/sympy/ 2025-09-07T10:21:29.6736301Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/fsspec/ 2025-09-07T10:21:29.6736972Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/fsspec/ 2025-09-07T10:21:29.6737712Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T10:21:29.6738571Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T10:21:29.6739406Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T10:21:29.6740238Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T10:21:29.6741076Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T10:21:29.6741890Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T10:21:29.6742801Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T10:21:29.6743559Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T10:21:29.6744302Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T10:21:29.6745101Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T10:21:29.6745847Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T10:21:29.6746602Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T10:21:29.6747348Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T10:21:29.6748111Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T10:21:29.6749016Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/networkx/ 2025-09-07T10:21:29.6749872Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/networkx/ 2025-09-07T10:21:29.6750630Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T10:21:29.6751432Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T10:21:29.6752246Z #43 0.618 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T10:21:29.6753055Z #43 0.618 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T10:21:29.6753927Z #43 0.618 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T10:21:29.6754755Z #43 0.618 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T10:21:29.6755541Z #43 0.618 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T10:21:29.6756317Z #43 0.618 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T10:21:29.6757090Z #43 0.618 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T10:21:29.6757888Z #43 0.618 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T10:21:29.6758673Z #43 0.618 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T10:21:29.6759432Z #43 0.618 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T10:21:29.6760223Z #43 0.618 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T10:21:29.6761026Z #43 0.618 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T10:21:29.6761915Z #43 0.618 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T10:21:29.6762726Z #43 0.618 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T10:21:29.6763461Z #43 0.618 DEBUG Found stale response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T10:21:29.6764190Z #43 0.618 DEBUG Sending revalidation request for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T10:21:29.6764904Z #43 0.619 DEBUG Found not-modified response for: https://pypi.org/simple/click/ 2025-09-07T10:21:29.6765622Z #43 0.619 DEBUG Found not-modified response for: https://pypi.org/simple/sympy/ 2025-09-07T10:21:29.6766297Z #43 0.619 DEBUG Found not-modified response for: https://pypi.org/simple/fsspec/ 2025-09-07T10:21:29.6767037Z #43 0.620 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T10:21:29.6767976Z #43 0.620 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T10:21:29.6769139Z #43 0.620 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.86, <12.9.86+) 2025-09-07T10:21:29.6770767Z #43 0.620 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.9.86, <12.9.86+ 2025-09-07T10:21:29.6772202Z #43 0.620 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T10:21:29.6773101Z #43 0.620 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.9.86: nvidia-cuda-nvrtc-cu12==12.9.86 2025-09-07T10:21:29.6774361Z #43 0.620 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.9.86: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.86 2025-09-07T10:21:29.6775472Z #43 0.620 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12 (==12.9.86) 2025-09-07T10:21:29.6776736Z #43 0.620 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:21:29.6777966Z #43 0.620 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T10:21:29.6778874Z #43 0.620 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T10:21:29.6780326Z #43 0.620 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:21:29.6781872Z #43 0.620 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.86) 2025-09-07T10:21:29.6783480Z #43 0.620 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:21:29.6784668Z #43 0.620 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T10:21:29.6785410Z #43 0.621 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T10:21:29.6786213Z #43 0.621 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T10:21:29.6787016Z #43 0.621 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T10:21:29.6787827Z #43 0.621 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T10:21:29.6788911Z #43 0.621 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T10:21:29.6790695Z #43 0.621 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T10:21:29.6792021Z #43 0.621 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T10:21:29.6792919Z #43 0.621 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.9.79: nvidia-cuda-runtime-cu12==12.9.79 2025-09-07T10:21:29.6794215Z #43 0.621 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.9.79: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T10:21:29.6795415Z #43 0.621 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12 (==12.9.79) 2025-09-07T10:21:29.6796714Z #43 0.621 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:21:29.6797968Z #43 0.621 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T10:21:29.6799241Z #43 0.621 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:21:29.6800806Z #43 0.621 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T10:21:29.6802345Z #43 0.621 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:21:29.6803785Z #43 0.621 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T10:21:29.6804773Z #43 0.621 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T10:21:29.6806223Z #43 0.621 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T10:21:29.6807369Z #43 0.621 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T10:21:29.6808195Z #43 0.621 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.9.79: nvidia-cuda-cupti-cu12==12.9.79 2025-09-07T10:21:29.6809424Z #43 0.621 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.9.79: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T10:21:29.6810522Z #43 0.621 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12 (==12.9.79) 2025-09-07T10:21:29.6811965Z #43 0.621 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:21:29.6813108Z #43 0.621 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T10:21:29.6814245Z #43 0.622 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:21:29.6815668Z #43 0.622 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T10:21:29.6817083Z #43 0.622 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:21:29.6818209Z #43 0.622 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T10:21:29.6819210Z #43 0.622 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=9.10.2.21, <9.10.2.21+) 2025-09-07T10:21:29.6820710Z #43 0.622 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=9.10.2.21, <9.10.2.21+ 2025-09-07T10:21:29.6821834Z #43 0.622 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T10:21:29.6822786Z #43 0.622 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12==9.10.2.21 2025-09-07T10:21:29.6823912Z #43 0.622 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==9.10.2.21 2025-09-07T10:21:29.6824976Z #43 0.622 DEBUG Searching for a compatible version of nvidia-cudnn-cu12 (==9.10.2.21) 2025-09-07T10:21:29.6826061Z #43 0.622 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T10:21:29.6827507Z #43 0.622 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T10:21:29.6828573Z #43 0.622 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T10:21:29.6829322Z #43 0.622 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T10:21:29.6830317Z #43 0.622 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==9.10.2.21) 2025-09-07T10:21:29.6831667Z #43 0.622 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T10:21:29.6832703Z #43 0.622 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T10:21:29.6833718Z #43 0.622 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies * 2025-09-07T10:21:29.6834825Z #43 0.622 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T10:21:29.6835872Z #43 0.622 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.1.4, <12.9.1.4+) 2025-09-07T10:21:29.6837267Z #43 0.622 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=12.9.1.4, <12.9.1.4+ 2025-09-07T10:21:29.6838368Z #43 0.622 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T10:21:29.6839541Z #43 0.622 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.9.1.4: nvidia-cublas-cu12==12.9.1.4 2025-09-07T10:21:29.6840700Z #43 0.622 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.9.1.4: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.1.4 2025-09-07T10:21:29.6842055Z #43 0.622 DEBUG Searching for a compatible version of nvidia-cublas-cu12 (==12.9.1.4) 2025-09-07T10:21:29.6843333Z #43 0.622 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T10:21:29.6844841Z #43 0.622 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T10:21:29.6845921Z #43 0.622 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T10:21:29.6846877Z #43 0.622 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.1.4) 2025-09-07T10:21:29.6848243Z #43 0.622 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T10:21:29.6849492Z #43 0.622 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T10:21:29.6850328Z #43 0.623 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T10:21:29.6851216Z #43 0.623 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T10:21:29.6852032Z #43 0.623 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T10:21:29.6852859Z #43 0.623 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T10:21:29.6853741Z #43 0.623 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T10:21:29.6854558Z #43 0.623 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T10:21:29.6855612Z #43 0.623 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.4.1.4, <11.4.1.4+) 2025-09-07T10:21:29.6857155Z #43 0.623 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=11.4.1.4, <11.4.1.4+ 2025-09-07T10:21:29.6858372Z #43 0.623 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T10:21:29.6859153Z #43 0.623 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-cufft-cu12==11.4.1.4 2025-09-07T10:21:29.6860354Z #43 0.623 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.4.1.4 2025-09-07T10:21:29.6861400Z #43 0.623 DEBUG Searching for a compatible version of nvidia-cufft-cu12 (==11.4.1.4) 2025-09-07T10:21:29.6862604Z #43 0.623 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T10:21:29.6863772Z #43 0.623 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T10:21:29.6864522Z #43 0.627 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T10:21:29.6865320Z #43 0.627 DEBUG Found not-modified response for: https://pypi.org/simple/networkx/ 2025-09-07T10:21:29.6866078Z #43 0.627 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T10:21:29.6866880Z #43 0.627 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T10:21:29.6867697Z #43 0.627 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T10:21:29.6868526Z #43 0.627 DEBUG Found not-modified response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T10:21:29.6869767Z #43 0.627 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T10:21:29.6871047Z #43 0.627 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-nvjitlink-cu12* 2025-09-07T10:21:29.6872100Z #43 0.627 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.4.1.4) 2025-09-07T10:21:29.6873552Z #43 0.627 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T10:21:29.6874719Z #43 0.627 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T10:21:29.6875494Z #43 0.627 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-nvjitlink-cu12* 2025-09-07T10:21:29.6876606Z #43 0.627 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=10.3.10.19, <10.3.10.19+) 2025-09-07T10:21:29.6878112Z #43 0.627 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=10.3.10.19, <10.3.10.19+ 2025-09-07T10:21:29.6879278Z #43 0.627 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T10:21:29.6880096Z #43 0.627 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.10.19: nvidia-curand-cu12==10.3.10.19 2025-09-07T10:21:29.6881307Z #43 0.627 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.10.19: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==10.3.10.19 2025-09-07T10:21:29.6882421Z #43 0.628 DEBUG Searching for a compatible version of nvidia-curand-cu12 (==10.3.10.19) 2025-09-07T10:21:29.6883558Z #43 0.628 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T10:21:29.6884668Z #43 0.628 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T10:21:29.6885556Z #43 0.628 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T10:21:29.6886956Z #43 0.628 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies * 2025-09-07T10:21:29.6888580Z #43 0.628 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T10:21:29.6889987Z #43 0.628 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==10.3.10.19) 2025-09-07T10:21:29.6891456Z #43 0.628 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T10:21:29.6892564Z #43 0.628 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T10:21:29.6893570Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.7.5.82, <11.7.5.82+) 2025-09-07T10:21:29.6895684Z #43 0.628 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=11.7.5.82, <11.7.5.82+ 2025-09-07T10:21:29.6897422Z #43 0.628 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T10:21:29.6898386Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusolver-cu12==11.7.5.82 2025-09-07T10:21:29.6899646Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.7.5.82 2025-09-07T10:21:29.6900750Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cusolver-cu12 (==11.7.5.82) 2025-09-07T10:21:29.6901930Z #43 0.628 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T10:21:29.6903076Z #43 0.628 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T10:21:29.6904208Z #43 0.628 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T10:21:29.6906149Z #43 0.628 DEBUG No cache entry for: https://files.pythonhosted.org/packages/4d/ec/fd869e2567cc9c01278a736cfd1697941ba0d4b81a43e0aa2e8d71dab208/msgpack-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.6907735Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cublas-cu12* 2025-09-07T10:21:29.6908671Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-nvjitlink-cu12* 2025-09-07T10:21:29.6909593Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusparse-cu12* 2025-09-07T10:21:29.6910669Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.7.5.82) 2025-09-07T10:21:29.6912089Z #43 0.628 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T10:21:29.6913276Z #43 0.628 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T10:21:29.6914453Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cublas-cu12* 2025-09-07T10:21:29.6915963Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-nvjitlink-cu12* 2025-09-07T10:21:29.6933425Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusparse-cu12* 2025-09-07T10:21:29.6934870Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.5.10.65, <12.5.10.65+) 2025-09-07T10:21:29.6937729Z #43 0.628 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.5.10.65, <12.5.10.65+ 2025-09-07T10:21:29.6940225Z #43 0.628 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T10:21:29.6942128Z #43 0.628 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies * 2025-09-07T10:21:29.6943502Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-cusparse-cu12==12.5.10.65 2025-09-07T10:21:29.6944851Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.5.10.65 2025-09-07T10:21:29.6945972Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cusparse-cu12 (==12.5.10.65) 2025-09-07T10:21:29.6947266Z #43 0.628 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T10:21:29.6949520Z #43 0.628 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T10:21:29.6951270Z #43 0.628 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T10:21:29.6953033Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-nvjitlink-cu12* 2025-09-07T10:21:29.6954584Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.5.10.65) 2025-09-07T10:21:29.6956110Z #43 0.628 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T10:21:29.6957375Z #43 0.628 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T10:21:29.6958194Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-nvjitlink-cu12* 2025-09-07T10:21:29.6959325Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=0.7.1, <0.7.1+) 2025-09-07T10:21:29.6961118Z #43 0.628 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies >=0.7.1, <0.7.1+ 2025-09-07T10:21:29.6962817Z #43 0.628 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T10:21:29.6963662Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12==0.7.1 2025-09-07T10:21:29.6964958Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==0.7.1 2025-09-07T10:21:29.6966129Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12 (==0.7.1) 2025-09-07T10:21:29.6968157Z #43 0.628 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T10:21:29.6970220Z #43 0.628 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T10:21:29.6971622Z #43 0.628 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T10:21:29.6972614Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==0.7.1) 2025-09-07T10:21:29.6974009Z #43 0.628 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T10:21:29.6975285Z #43 0.628 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T10:21:29.6976266Z #43 0.628 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=2.27.5, <2.27.5+) 2025-09-07T10:21:29.6977732Z #43 0.628 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=2.27.5, <2.27.5+ 2025-09-07T10:21:29.6979289Z #43 0.628 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T10:21:29.6980067Z #43 0.628 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12==2.27.5 2025-09-07T10:21:29.6981189Z #43 0.628 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==2.27.5 2025-09-07T10:21:29.6982221Z #43 0.628 DEBUG Searching for a compatible version of nvidia-nccl-cu12 (==2.27.5) 2025-09-07T10:21:29.6983485Z #43 0.628 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T10:21:29.6985117Z #43 0.628 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T10:21:29.6986851Z #43 0.628 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T10:21:29.6987941Z #43 0.628 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==2.27.5) 2025-09-07T10:21:29.6989361Z #43 0.628 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T10:21:29.6990534Z #43 0.628 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T10:21:29.6991501Z #43 0.628 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=3.3.20, <3.3.20+) 2025-09-07T10:21:29.6993083Z #43 0.628 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=3.3.20, <3.3.20+ 2025-09-07T10:21:29.6994284Z #43 0.628 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T10:21:29.6995153Z #43 0.628 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12==3.3.20 2025-09-07T10:21:29.6996827Z #43 0.628 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.3.20 2025-09-07T10:21:29.6998418Z #43 0.628 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12 (==3.3.20) 2025-09-07T10:21:29.7000270Z #43 0.628 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T10:21:29.7001972Z #43 0.628 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T10:21:29.7003517Z #43 0.628 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T10:21:29.7004894Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.3.20) 2025-09-07T10:21:29.7006363Z #43 0.629 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T10:21:29.7007618Z #43 0.629 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T10:21:29.7009302Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T10:21:29.7011280Z #43 0.629 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T10:21:29.7012457Z #43 0.629 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T10:21:29.7013222Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.9.79: nvidia-nvtx-cu12==12.9.79 2025-09-07T10:21:29.7014405Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.9.79: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T10:21:29.7015438Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvtx-cu12 (==12.9.79) 2025-09-07T10:21:29.7016587Z #43 0.629 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:21:29.7018297Z #43 0.629 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:21:29.7019406Z #43 0.629 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T10:21:29.7020335Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T10:21:29.7022784Z #43 0.629 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:21:29.7024049Z #43 0.629 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T10:21:29.7025049Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.86, <12.9.86+) 2025-09-07T10:21:29.7027052Z #43 0.629 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.9.86, <12.9.86+ 2025-09-07T10:21:29.7028328Z #43 0.629 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T10:21:29.7029245Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.9.86: nvidia-nvjitlink-cu12==12.9.86 2025-09-07T10:21:29.7030861Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.9.86: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.86 2025-09-07T10:21:29.7032221Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12 (==12.9.86) 2025-09-07T10:21:29.7033558Z #43 0.629 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:21:29.7035355Z #43 0.629 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:21:29.7036585Z #43 0.629 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T10:21:29.7038154Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.86) 2025-09-07T10:21:29.7040177Z #43 0.629 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:21:29.7041485Z #43 0.629 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T10:21:29.7042506Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=1.14.1.1, <1.14.1.1+) 2025-09-07T10:21:29.7045250Z #43 0.629 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=1.14.1.1, <1.14.1.1+ 2025-09-07T10:21:29.7046839Z #43 0.629 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T10:21:29.7047656Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.14.1.1: nvidia-cufile-cu12==1.14.1.1 2025-09-07T10:21:29.7049070Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.14.1.1: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==1.14.1.1 2025-09-07T10:21:29.7050161Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cufile-cu12 (==1.14.1.1) 2025-09-07T10:21:29.7052075Z #43 0.629 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T10:21:29.7053802Z #43 0.629 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T10:21:29.7054998Z #43 0.629 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T10:21:29.7055957Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==1.14.1.1) 2025-09-07T10:21:29.7057428Z #43 0.629 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T10:21:29.7058609Z #43 0.629 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T10:21:29.7059451Z #43 0.629 DEBUG Searching for a compatible version of pytorch-triton{sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T10:21:29.7060998Z #43 0.629 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:21:29.7062336Z #43 0.629 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T10:21:29.7063293Z #43 0.629 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton==3.4.0+gitf7888497 2025-09-07T10:21:29.7064540Z #43 0.629 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton{sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T10:21:29.7066175Z #43 0.629 DEBUG Searching for a compatible version of pytorch-triton (==3.4.0+gitf7888497) 2025-09-07T10:21:29.7067672Z #43 0.629 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:21:29.7069010Z #43 0.629 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T10:21:29.7071047Z #43 0.629 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:21:29.7072879Z #43 0.629 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T10:21:29.7073813Z #43 0.629 DEBUG Searching for a compatible version of pytorch-triton{sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T10:21:29.7075308Z #43 0.629 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:21:29.7076806Z #43 0.629 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T10:21:29.7077604Z #43 0.629 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T10:21:29.7078574Z #43 0.629 DEBUG Searching for a compatible version of triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (*) 2025-09-07T10:21:29.7079625Z #43 0.629 DEBUG Selecting: triton==3.4.0 [compatible] (triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.7080488Z #43 0.629 DEBUG Adding transitive dependency for triton==3.4.0: triton==3.4.0 2025-09-07T10:21:29.7081393Z #43 0.629 DEBUG Adding transitive dependency for triton==3.4.0: triton{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.4.0 2025-09-07T10:21:29.7082259Z #43 0.629 DEBUG Searching for a compatible version of triton (==3.4.0) 2025-09-07T10:21:29.7083089Z #43 0.629 DEBUG Selecting: triton==3.4.0 [compatible] (triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.7085629Z #43 0.629 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d0/66/b1eb52839f563623d185f0927eb3530ee4d5ffe9d377cdaf5346b306689e/triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T10:21:29.7087917Z #43 0.629 DEBUG Adding transitive dependency for triton==3.4.0: setuptools>=40.8.0 2025-09-07T10:21:29.7089360Z #43 0.629 DEBUG Searching for a compatible version of triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.4.0) 2025-09-07T10:21:29.7090552Z #43 0.629 DEBUG Selecting: triton==3.4.0 [compatible] (triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.7091822Z #43 0.629 DEBUG Adding transitive dependency for triton==3.4.0: setuptools>=40.8.0 2025-09-07T10:21:29.7092465Z #43 0.629 DEBUG Searching for a compatible version of frozendict (*) 2025-09-07T10:21:29.7093164Z #43 0.629 DEBUG Selecting: frozendict==2.4.6 [compatible] (frozendict-2.4.6-py312-none-any.whl) 2025-09-07T10:21:29.7093849Z #43 0.629 DEBUG Searching for a compatible version of astor (*) 2025-09-07T10:21:29.7094464Z #43 0.629 DEBUG Selecting: astor==0.8.1 [compatible] (astor-0.8.1-py2.py3-none-any.whl) 2025-09-07T10:21:29.7095096Z #43 0.629 DEBUG Searching for a compatible version of dill (*) 2025-09-07T10:21:29.7095679Z #43 0.629 DEBUG Selecting: dill==0.4.0 [compatible] (dill-0.4.0-py3-none-any.whl) 2025-09-07T10:21:29.7096411Z #43 0.629 DEBUG Searching for a compatible version of llvmlite (>=0.44.0.dev0, <0.45) 2025-09-07T10:21:29.7097694Z #43 0.629 DEBUG Selecting: llvmlite==0.44.0 [compatible] (llvmlite-0.44.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.7098143Z #43 0.629 DEBUG Searching for a compatible version of charset-normalizer (>=2, <4) 2025-09-07T10:21:29.7099203Z #43 0.629 DEBUG Selecting: charset-normalizer==3.4.3 [compatible] (charset_normalizer-3.4.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.7099432Z #43 0.630 DEBUG Searching for a compatible version of idna (>=2.5, <4) 2025-09-07T10:21:29.7099707Z #43 0.630 DEBUG Selecting: idna==3.10 [compatible] (idna-3.10-py3-none-any.whl) 2025-09-07T10:21:29.7099954Z #43 0.630 DEBUG Searching for a compatible version of urllib3 (>=1.21.1, <3) 2025-09-07T10:21:29.7100243Z #43 0.630 DEBUG Selecting: urllib3==2.5.0 [compatible] (urllib3-2.5.0-py3-none-any.whl) 2025-09-07T10:21:29.7100504Z #43 0.630 DEBUG Searching for a compatible version of certifi (>=2017.4.17) 2025-09-07T10:21:29.7100828Z #43 0.630 DEBUG Selecting: certifi==2025.8.3 [compatible] (certifi-2025.8.3-py3-none-any.whl) 2025-09-07T10:21:29.7101115Z #43 0.630 DEBUG Searching for a compatible version of huggingface-hub (>=0.34.0, <1.0) 2025-09-07T10:21:29.7101540Z #43 0.630 DEBUG Selecting: huggingface-hub==0.34.4 [compatible] (huggingface_hub-0.34.4-py3-none-any.whl) 2025-09-07T10:21:29.7101828Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: filelock* 2025-09-07T10:21:29.7102148Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: fsspec>=2023.5.0 2025-09-07T10:21:29.7102477Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: packaging>=20.9 2025-09-07T10:21:29.7102773Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: pyyaml>=5.1 2025-09-07T10:21:29.7103063Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: requests* 2025-09-07T10:21:29.7103371Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: tqdm>=4.42.1 2025-09-07T10:21:29.7103740Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: typing-extensions>=3.7.4.3 2025-09-07T10:21:29.7104582Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: hf-xet{platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}>=1.1.3, <2.0.0 2025-09-07T10:21:29.7104883Z #43 0.630 DEBUG Searching for a compatible version of safetensors (>=0.4.3) 2025-09-07T10:21:29.7105400Z #43 0.630 DEBUG Selecting: safetensors==0.6.2 [compatible] (safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.7105911Z #43 0.630 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=2023.5.0 2025-09-07T10:21:29.7106199Z #43 0.630 DEBUG Searching for a compatible version of starlette (>=0.40.0, <0.48.0) 2025-09-07T10:21:29.7106519Z #43 0.630 DEBUG Selecting: starlette==0.47.3 [compatible] (starlette-0.47.3-py3-none-any.whl) 2025-09-07T10:21:29.7106804Z #43 0.630 DEBUG Adding transitive dependency for starlette==0.47.3: anyio>=3.6.2, <5 2025-09-07T10:21:29.8294768Z #43 0.630 DEBUG Adding transitive dependency for starlette==0.47.3: typing-extensions{python_full_version < '3.13'}>=4.10.0 2025-09-07T10:21:29.8295259Z #43 0.630 DEBUG Searching for a compatible version of typing-extensions{python_full_version < '3.13'} (>=4.10.0) 2025-09-07T10:21:29.8295955Z #43 0.630 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T10:21:29.8296216Z #43 0.630 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T10:21:29.8296603Z #43 0.630 DEBUG Adding transitive dependency for typing-extensions==4.14.1: typing-extensions==4.14.1 2025-09-07T10:21:29.8297314Z #43 0.630 DEBUG Adding transitive dependency for typing-extensions==4.14.1: typing-extensions{python_full_version < '3.13'}==4.14.1 2025-09-07T10:21:29.8297746Z #43 0.630 DEBUG Searching for a compatible version of typing-extensions{python_full_version < '3.13'} (==4.14.1) 2025-09-07T10:21:29.8298355Z #43 0.630 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies ==4.14.1 2025-09-07T10:21:29.8298708Z #43 0.630 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T10:21:29.8299002Z #43 0.630 DEBUG Searching for a compatible version of fastapi-cli[standard] (>=0.0.8) 2025-09-07T10:21:29.8299338Z #43 0.630 DEBUG Selecting: fastapi-cli==0.0.10 [compatible] (fastapi_cli-0.0.10-py3-none-any.whl) 2025-09-07T10:21:29.8299655Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: fastapi-cli==0.0.10 2025-09-07T10:21:29.8300039Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: fastapi-cli[standard]==0.0.10 2025-09-07T10:21:29.8300294Z #43 0.630 DEBUG Searching for a compatible version of fastapi-cli (==0.0.10) 2025-09-07T10:21:29.8300629Z #43 0.630 DEBUG Selecting: fastapi-cli==0.0.10 [compatible] (fastapi_cli-0.0.10-py3-none-any.whl) 2025-09-07T10:21:29.8301585Z #43 0.630 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/7c/62/0f00036925c0614e333a2baf739c861453a6779331ffb47ec9a6147f860b/fastapi_cli-0.0.10-py3-none-any.whl.metadata 2025-09-07T10:21:29.8301900Z #43 0.630 DEBUG Found stale response for: https://pypi.org/simple/hf-xet/ 2025-09-07T10:21:29.8302194Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: typer>=0.15.1 2025-09-07T10:21:29.8302501Z #43 0.630 DEBUG Sending revalidation request for: https://pypi.org/simple/hf-xet/ 2025-09-07T10:21:29.8302847Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: uvicorn[standard]>=0.15.0 2025-09-07T10:21:29.8303285Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: rich-toolkit>=0.14.8 2025-09-07T10:21:29.8303586Z #43 0.630 DEBUG Searching for a compatible version of fastapi-cli[standard] (==0.0.10) 2025-09-07T10:21:29.8303912Z #43 0.630 DEBUG Selecting: fastapi-cli==0.0.10 [compatible] (fastapi_cli-0.0.10-py3-none-any.whl) 2025-09-07T10:21:29.8304246Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: uvicorn[standard]>=0.15.0 2025-09-07T10:21:29.8304601Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: fastapi-cloud-cli>=0.1.1 2025-09-07T10:21:29.8304892Z #43 0.630 DEBUG Searching for a compatible version of httpx (>=0.23.0, <1) 2025-09-07T10:21:29.8305175Z #43 0.630 DEBUG Selecting: httpx==0.28.1 [compatible] (httpx-0.28.1-py3-none-any.whl) 2025-09-07T10:21:29.8305409Z #43 0.630 DEBUG Adding transitive dependency for httpx==0.28.1: anyio* 2025-09-07T10:21:29.8305638Z #43 0.630 DEBUG Adding transitive dependency for httpx==0.28.1: certifi* 2025-09-07T10:21:29.8305938Z #43 0.630 DEBUG Adding transitive dependency for httpx==0.28.1: httpcore>=1.dev0, <2.dev0 2025-09-07T10:21:29.8306152Z #43 0.630 DEBUG Adding transitive dependency for httpx==0.28.1: idna* 2025-09-07T10:21:29.8306383Z #43 0.630 DEBUG Searching for a compatible version of jinja2 (>=3.1.5) 2025-09-07T10:21:29.8306844Z #43 0.630 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies >=3.1.5 2025-09-07T10:21:29.8307040Z #43 0.630 DEBUG Selecting: jinja2==3.1.6 [installed] (installed) 2025-09-07T10:21:29.8307311Z #43 0.630 DEBUG Adding transitive dependency for jinja2==3.1.6: markupsafe>=2.0 2025-09-07T10:21:29.8307580Z #43 0.630 DEBUG Searching for a compatible version of python-multipart (>=0.0.18) 2025-09-07T10:21:29.8307962Z #43 0.630 DEBUG Selecting: python-multipart==0.0.20 [compatible] (python_multipart-0.0.20-py3-none-any.whl) 2025-09-07T10:21:29.8308231Z #43 0.630 DEBUG Searching for a compatible version of email-validator (>=2.0.0) 2025-09-07T10:21:29.8308631Z #43 0.630 DEBUG Selecting: email-validator==2.3.0 [compatible] (email_validator-2.3.0-py3-none-any.whl) 2025-09-07T10:21:29.8308938Z #43 0.630 DEBUG Adding transitive dependency for email-validator==2.3.0: dnspython>=2.0.0 2025-09-07T10:21:29.8309233Z #43 0.630 DEBUG Adding transitive dependency for email-validator==2.3.0: idna>=2.0.0 2025-09-07T10:21:29.8309496Z #43 0.630 DEBUG Searching for a compatible version of uvicorn[standard] (>=0.15.0) 2025-09-07T10:21:29.8309823Z #43 0.630 DEBUG Selecting: uvicorn==0.35.0 [compatible] (uvicorn-0.35.0-py3-none-any.whl) 2025-09-07T10:21:29.8310103Z #43 0.630 DEBUG Adding transitive dependency for uvicorn==0.35.0: uvicorn==0.35.0 2025-09-07T10:21:29.8310413Z #43 0.630 DEBUG Adding transitive dependency for uvicorn==0.35.0: uvicorn[standard]==0.35.0 2025-09-07T10:21:29.8310642Z #43 0.630 DEBUG Searching for a compatible version of uvicorn (==0.35.0) 2025-09-07T10:21:29.8310944Z #43 0.630 DEBUG Selecting: uvicorn==0.35.0 [compatible] (uvicorn-0.35.0-py3-none-any.whl) 2025-09-07T10:21:29.8311238Z #43 0.630 DEBUG Found stale response for: https://pypi.org/simple/fastapi-cloud-cli/ 2025-09-07T10:21:29.8311571Z #43 0.630 DEBUG Sending revalidation request for: https://pypi.org/simple/fastapi-cloud-cli/ 2025-09-07T10:21:29.8311808Z #43 0.630 DEBUG Found stale response for: https://pypi.org/simple/typer/ 2025-09-07T10:21:29.8312099Z #43 0.630 DEBUG Sending revalidation request for: https://pypi.org/simple/typer/ 2025-09-07T10:21:29.8312388Z #43 0.630 DEBUG Found stale response for: https://pypi.org/simple/httpcore/ 2025-09-07T10:21:29.8312681Z #43 0.630 DEBUG Sending revalidation request for: https://pypi.org/simple/httpcore/ 2025-09-07T10:21:29.8312956Z #43 0.630 DEBUG Found stale response for: https://pypi.org/simple/rich-toolkit/ 2025-09-07T10:21:29.8313259Z #43 0.630 DEBUG Sending revalidation request for: https://pypi.org/simple/rich-toolkit/ 2025-09-07T10:21:29.8313510Z #43 0.630 DEBUG Found stale response for: https://pypi.org/simple/dnspython/ 2025-09-07T10:21:29.8313828Z #43 0.630 DEBUG Sending revalidation request for: https://pypi.org/simple/dnspython/ 2025-09-07T10:21:29.8314718Z #43 0.630 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d2/e2/dc81b1bd1dcfe91735810265e9d26bc8ec5da45b4c0f6237e286819194c3/uvicorn-0.35.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.8314976Z #43 0.631 DEBUG Found stale response for: https://pypi.org/simple/markupsafe/ 2025-09-07T10:21:29.8315237Z #43 0.631 DEBUG Adding transitive dependency for uvicorn==0.35.0: click>=7.0 2025-09-07T10:21:29.8315472Z #43 0.631 DEBUG Adding transitive dependency for uvicorn==0.35.0: h11>=0.8 2025-09-07T10:21:29.8315835Z #43 0.631 DEBUG Sending revalidation request for: https://pypi.org/simple/markupsafe/ 2025-09-07T10:21:29.8316115Z #43 0.631 DEBUG Searching for a compatible version of uvicorn[standard] (==0.35.0) 2025-09-07T10:21:29.8316403Z #43 0.631 DEBUG Selecting: uvicorn==0.35.0 [compatible] (uvicorn-0.35.0-py3-none-any.whl) 2025-09-07T10:21:29.8316697Z #43 0.631 DEBUG Adding transitive dependency for uvicorn==0.35.0: httptools>=0.6.3 2025-09-07T10:21:29.8316986Z #43 0.631 DEBUG Adding transitive dependency for uvicorn==0.35.0: python-dotenv>=0.13 2025-09-07T10:21:29.8317241Z #43 0.631 DEBUG Adding transitive dependency for uvicorn==0.35.0: pyyaml>=5.1 2025-09-07T10:21:29.8317930Z #43 0.631 DEBUG Adding transitive dependency for uvicorn==0.35.0: uvloop{platform_python_implementation != 'PyPy' and sys_platform != 'cygwin' and sys_platform != 'win32'}>=0.15.1 2025-09-07T10:21:29.8318206Z #43 0.631 DEBUG Adding transitive dependency for uvicorn==0.35.0: watchfiles>=0.13 2025-09-07T10:21:29.8318477Z #43 0.631 DEBUG Adding transitive dependency for uvicorn==0.35.0: websockets>=10.4 2025-09-07T10:21:29.8318759Z #43 0.631 DEBUG Searching for a compatible version of aiohappyeyeballs (>=2.5.0) 2025-09-07T10:21:29.8319144Z #43 0.631 DEBUG Selecting: aiohappyeyeballs==2.6.1 [compatible] (aiohappyeyeballs-2.6.1-py3-none-any.whl) 2025-09-07T10:21:29.8319411Z #43 0.631 DEBUG Searching for a compatible version of aiosignal (>=1.4.0) 2025-09-07T10:21:29.8319719Z #43 0.631 DEBUG Selecting: aiosignal==1.4.0 [compatible] (aiosignal-1.4.0-py3-none-any.whl) 2025-09-07T10:21:29.8320012Z #43 0.631 DEBUG Adding transitive dependency for aiosignal==1.4.0: frozenlist>=1.1.0 2025-09-07T10:21:29.8320445Z #43 0.631 DEBUG Adding transitive dependency for aiosignal==1.4.0: typing-extensions{python_full_version < '3.13'}>=4.2 2025-09-07T10:21:29.8320701Z #43 0.631 DEBUG Searching for a compatible version of attrs (>=17.3.0) 2025-09-07T10:21:29.8320981Z #43 0.631 DEBUG Selecting: attrs==25.3.0 [compatible] (attrs-25.3.0-py3-none-any.whl) 2025-09-07T10:21:29.8321256Z #43 0.631 DEBUG Found not-modified response for: https://pypi.org/simple/hf-xet/ 2025-09-07T10:21:29.8321489Z #43 0.631 DEBUG Searching for a compatible version of frozenlist (>=1.1.1) 2025-09-07T10:21:29.8322178Z #43 0.631 DEBUG Selecting: frozenlist==1.7.0 [compatible] (frozenlist-1.7.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8322424Z #43 0.632 DEBUG Searching for a compatible version of multidict (>=4.5, <7.0) 2025-09-07T10:21:29.8323012Z #43 0.632 DEBUG Selecting: multidict==6.6.4 [compatible] (multidict-6.6.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.8323312Z #43 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/httpcore/ 2025-09-07T10:21:29.8323580Z #43 0.632 DEBUG Searching for a compatible version of propcache (>=0.2.0) 2025-09-07T10:21:29.8324068Z #43 0.632 DEBUG Selecting: propcache==0.3.2 [compatible] (propcache-0.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8324356Z #43 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/typer/ 2025-09-07T10:21:29.8324589Z #43 0.632 DEBUG Searching for a compatible version of yarl (>=1.17.0, <2.0) 2025-09-07T10:21:29.8325017Z #43 0.632 DEBUG Selecting: yarl==1.20.1 [compatible] (yarl-1.20.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8325359Z #43 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/fastapi-cloud-cli/ 2025-09-07T10:21:29.8325588Z #43 0.632 DEBUG Adding transitive dependency for yarl==1.20.1: idna>=2.0 2025-09-07T10:21:29.8325833Z #43 0.632 DEBUG Adding transitive dependency for yarl==1.20.1: multidict>=4.0 2025-09-07T10:21:29.8326108Z #43 0.632 DEBUG Adding transitive dependency for yarl==1.20.1: propcache>=0.2.1 2025-09-07T10:21:29.8326337Z #43 0.632 DEBUG Searching for a compatible version of anyio (>=3.6.2, <5) 2025-09-07T10:21:29.8326637Z #43 0.632 DEBUG Selecting: anyio==4.10.0 [compatible] (anyio-4.10.0-py3-none-any.whl) 2025-09-07T10:21:29.8326928Z #43 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/dnspython/ 2025-09-07T10:21:29.8327167Z #43 0.632 DEBUG Adding transitive dependency for anyio==4.10.0: idna>=2.8 2025-09-07T10:21:29.8327407Z #43 0.632 DEBUG Adding transitive dependency for anyio==4.10.0: sniffio>=1.1 2025-09-07T10:21:29.8327819Z #43 0.632 DEBUG Adding transitive dependency for anyio==4.10.0: typing-extensions{python_full_version < '3.13'}>=4.5 2025-09-07T10:21:29.8328068Z #43 0.632 DEBUG Searching for a compatible version of distro (>=1.7.0, <2) 2025-09-07T10:21:29.8328341Z #43 0.632 DEBUG Selecting: distro==1.9.0 [compatible] (distro-1.9.0-py3-none-any.whl) 2025-09-07T10:21:29.8328572Z #43 0.632 DEBUG Searching for a compatible version of jiter (>=0.4.0, <1) 2025-09-07T10:21:29.8329022Z #43 0.632 DEBUG Selecting: jiter==0.10.0 [compatible] (jiter-0.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8329240Z #43 0.632 DEBUG Searching for a compatible version of sniffio (>=1.1) 2025-09-07T10:21:29.8329622Z #43 0.632 DEBUG Selecting: sniffio==1.3.1 [compatible] (sniffio-1.3.1-py3-none-any.whl) 2025-09-07T10:21:29.8329860Z #43 0.632 DEBUG Found stale response for: https://pypi.org/simple/h11/ 2025-09-07T10:21:29.8330151Z #43 0.632 DEBUG Searching for a compatible version of annotated-types (>=0.6.0) 2025-09-07T10:21:29.8330417Z #43 0.632 DEBUG Sending revalidation request for: https://pypi.org/simple/h11/ 2025-09-07T10:21:29.8330886Z #43 0.632 DEBUG Selecting: annotated-types==0.7.0 [compatible] (annotated_types-0.7.0-py3-none-any.whl) 2025-09-07T10:21:29.8331327Z #43 0.632 DEBUG Searching for a compatible version of typing-inspection (>=0.4.0) 2025-09-07T10:21:29.8331767Z #43 0.632 DEBUG Selecting: typing-inspection==0.4.1 [compatible] (typing_inspection-0.4.1-py3-none-any.whl) 2025-09-07T10:21:29.8332156Z #43 0.632 DEBUG Adding transitive dependency for typing-inspection==0.4.1: typing-extensions>=4.12.0 2025-09-07T10:21:29.8332400Z #43 0.632 DEBUG Searching for a compatible version of jsonschema (>=4.21.1) 2025-09-07T10:21:29.8332731Z #43 0.632 DEBUG Selecting: jsonschema==4.25.1 [compatible] (jsonschema-4.25.1-py3-none-any.whl) 2025-09-07T10:21:29.8333007Z #43 0.632 DEBUG Adding transitive dependency for jsonschema==4.25.1: attrs>=22.2.0 2025-09-07T10:21:29.8333416Z #43 0.632 DEBUG Adding transitive dependency for jsonschema==4.25.1: jsonschema-specifications>=2023.3.6 2025-09-07T10:21:29.8333720Z #43 0.632 DEBUG Adding transitive dependency for jsonschema==4.25.1: referencing>=0.28.4 2025-09-07T10:21:29.8334005Z #43 0.632 DEBUG Adding transitive dependency for jsonschema==4.25.1: rpds-py>=0.7.1 2025-09-07T10:21:29.8334360Z #43 0.632 DEBUG Searching for a compatible version of pydantic-extra-types[pycountry] (>=2.10.5) 2025-09-07T10:21:29.8334825Z #43 0.632 DEBUG Selecting: pydantic-extra-types==2.10.5 [compatible] (pydantic_extra_types-2.10.5-py3-none-any.whl) 2025-09-07T10:21:29.8335236Z #43 0.632 DEBUG Adding transitive dependency for pydantic-extra-types==2.10.5: pydantic-extra-types==2.10.5 2025-09-07T10:21:29.8335714Z #43 0.632 DEBUG Adding transitive dependency for pydantic-extra-types==2.10.5: pydantic-extra-types[pycountry]==2.10.5 2025-09-07T10:21:29.8336007Z #43 0.632 DEBUG Searching for a compatible version of pydantic-extra-types (==2.10.5) 2025-09-07T10:21:29.8336435Z #43 0.632 DEBUG Selecting: pydantic-extra-types==2.10.5 [compatible] (pydantic_extra_types-2.10.5-py3-none-any.whl) 2025-09-07T10:21:29.8336757Z #43 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/rich-toolkit/ 2025-09-07T10:21:29.8337018Z #43 0.632 DEBUG Found stale response for: https://pypi.org/simple/httptools/ 2025-09-07T10:21:29.8337321Z #43 0.632 DEBUG Sending revalidation request for: https://pypi.org/simple/httptools/ 2025-09-07T10:21:29.8337615Z #43 0.633 DEBUG Found stale response for: https://pypi.org/simple/python-dotenv/ 2025-09-07T10:21:29.8337965Z #43 0.633 DEBUG Sending revalidation request for: https://pypi.org/simple/python-dotenv/ 2025-09-07T10:21:29.8338208Z #43 0.633 DEBUG Found stale response for: https://pypi.org/simple/uvloop/ 2025-09-07T10:21:29.8338508Z #43 0.633 DEBUG Sending revalidation request for: https://pypi.org/simple/uvloop/ 2025-09-07T10:21:29.8338843Z #43 0.633 DEBUG Found stale response for: https://pypi.org/simple/jsonschema-specifications/ 2025-09-07T10:21:29.8339225Z #43 0.633 DEBUG Sending revalidation request for: https://pypi.org/simple/jsonschema-specifications/ 2025-09-07T10:21:29.8339511Z #43 0.633 DEBUG Found stale response for: https://pypi.org/simple/referencing/ 2025-09-07T10:21:29.8339825Z #43 0.633 DEBUG Sending revalidation request for: https://pypi.org/simple/referencing/ 2025-09-07T10:21:29.8340128Z #43 0.633 DEBUG Found not-modified response for: https://pypi.org/simple/markupsafe/ 2025-09-07T10:21:29.8341116Z #43 0.633 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/70/1a/5f4fd9e7285f10c44095a4f9fe17d0f358d1702a7c74a9278c794e8a7537/pydantic_extra_types-2.10.5-py3-none-any.whl.metadata 2025-09-07T10:21:29.8342038Z #43 0.633 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl.metadata 2025-09-07T10:21:29.8342380Z #43 0.633 DEBUG Adding transitive dependency for pydantic-extra-types==2.10.5: pydantic>=2.5.2 2025-09-07T10:21:29.8342786Z #43 0.633 DEBUG Adding transitive dependency for pydantic-extra-types==2.10.5: typing-extensions* 2025-09-07T10:21:29.8343241Z #43 0.633 DEBUG Searching for a compatible version of pydantic-extra-types[pycountry] (==2.10.5) 2025-09-07T10:21:29.8343659Z #43 0.633 DEBUG Selecting: pydantic-extra-types==2.10.5 [compatible] (pydantic_extra_types-2.10.5-py3-none-any.whl) 2025-09-07T10:21:29.8344016Z #43 0.633 DEBUG Adding transitive dependency for pydantic-extra-types==2.10.5: pycountry>=23 2025-09-07T10:21:29.8344266Z #43 0.633 DEBUG Searching for a compatible version of soundfile (>=0.12.1) 2025-09-07T10:21:29.8344678Z #43 0.633 DEBUG Selecting: soundfile==0.13.1 [compatible] (soundfile-0.13.1-py2.py3-none-manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.8344937Z #43 0.633 DEBUG Found stale response for: https://pypi.org/simple/websockets/ 2025-09-07T10:21:29.8345252Z #43 0.633 DEBUG Sending revalidation request for: https://pypi.org/simple/websockets/ 2025-09-07T10:21:29.8345496Z #43 0.633 DEBUG Adding transitive dependency for soundfile==0.13.1: cffi>=1.0 2025-09-07T10:21:29.8345731Z #43 0.633 DEBUG Adding transitive dependency for soundfile==0.13.1: numpy* 2025-09-07T10:21:29.8345954Z #43 0.633 DEBUG Searching for a compatible version of soxr (>=0.5.0) 2025-09-07T10:21:29.8346634Z #43 0.633 DEBUG Selecting: soxr==0.5.0.post1 [compatible] (soxr-0.5.0.post1-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8347644Z #43 0.633 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/93/72/6b3e70d32e89a5cbb6a4513726c1ae8762165b027af569289e19ec08edd8/typer-0.17.4-py3-none-any.whl.metadata 2025-09-07T10:21:29.8347896Z #43 0.633 DEBUG Adding transitive dependency for soxr==0.5.0.post1: numpy* 2025-09-07T10:21:29.8348106Z #43 0.633 DEBUG Searching for a compatible version of click (>=7.0) 2025-09-07T10:21:29.8349416Z #43 0.633 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e5/a6/5aa862489a2918a096166fd98d9fe86b7fd53c607678b3fa9d8c432d88d5/fastapi_cloud_cli-0.1.5-py3-none-any.whl.metadata 2025-09-07T10:21:29.8349757Z #43 0.633 DEBUG Searching for a compatible version of click (>=7.0, <8.2.2 | >8.2.2) 2025-09-07T10:21:29.8350033Z #43 0.633 DEBUG Selecting: click==8.2.1 [compatible] (click-8.2.1-py3-none-any.whl) 2025-09-07T10:21:29.8350942Z #43 0.633 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/68/1b/e0a87d256e40e8c888847551b20a017a6b98139178505dc7ffb96f04e954/dnspython-2.7.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.8351965Z #43 0.634 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c8/49/42821d55ead7b5a87c8d121edf323cb393d8579f63e933002ade900b784f/rich_toolkit-0.15.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.8352673Z #43 0.634 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T10:21:29.8352946Z #43 0.634 DEBUG Found stale response for: https://pypi.org/simple/pycountry/ 2025-09-07T10:21:29.8353267Z #43 0.634 DEBUG Sending revalidation request for: https://pypi.org/simple/pycountry/ 2025-09-07T10:21:29.8353530Z #43 0.634 DEBUG Found not-modified response for: https://pypi.org/simple/h11/ 2025-09-07T10:21:29.8353810Z #43 0.634 DEBUG Found not-modified response for: https://pypi.org/simple/uvloop/ 2025-09-07T10:21:29.8354146Z #43 0.634 DEBUG Found not-modified response for: https://pypi.org/simple/python-dotenv/ 2025-09-07T10:21:29.8354440Z #43 0.634 DEBUG Found not-modified response for: https://pypi.org/simple/httptools/ 2025-09-07T10:21:29.8355336Z #43 0.634 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/85/32/10bb5764d90a8eee674e9dc6f4db6a0ab47c8c4d0d83c27f7c39ac415a4d/click-8.2.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.8355614Z #43 0.634 DEBUG Searching for a compatible version of msgpack (>=1.0.0, <2.0.0) 2025-09-07T10:21:29.8356139Z #43 0.634 DEBUG Selecting: msgpack==1.1.1 [compatible] (msgpack-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8356448Z #43 0.635 DEBUG Found not-modified response for: https://pypi.org/simple/referencing/ 2025-09-07T10:21:29.8356835Z #43 0.635 DEBUG Found not-modified response for: https://pypi.org/simple/jsonschema-specifications/ 2025-09-07T10:21:29.8357068Z #43 0.635 DEBUG Found stale response for: https://pypi.org/simple/cffi/ 2025-09-07T10:21:29.8357393Z #43 0.635 DEBUG Sending revalidation request for: https://pypi.org/simple/cffi/ 2025-09-07T10:21:29.8358281Z #43 0.635 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.8359204Z #43 0.635 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/5f/ed/539768cf28c661b5b068d66d96a2f155c4971a5d55684a514c1a0e0dec2f/python_dotenv-1.1.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.8360492Z #43 0.635 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/f7/d8/b644c44acc1368938317d76ac991c9bba1166311880bcc0ac297cb9d6bd7/httptools-0.6.4-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.8361534Z #43 0.635 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c1/b1/3baf80dc6d2b7bc27a95a67752d0208e410351e3feb4eb78de5f77454d8d/referencing-0.36.2-py3-none-any.whl.metadata 2025-09-07T10:21:29.8362609Z #43 0.635 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/01/0e/b27cdbaccf30b890c40ed1da9fd4a3593a5cf94dae54fb34f8a4b74fcd3f/jsonschema_specifications-2025.4.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.8362903Z #43 0.635 DEBUG Found not-modified response for: https://pypi.org/simple/websockets/ 2025-09-07T10:21:29.8363203Z #43 0.636 DEBUG Found not-modified response for: https://pypi.org/simple/pycountry/ 2025-09-07T10:21:29.8363535Z #43 0.637 DEBUG Searching for a compatible version of cupy-cuda12x{sys_platform != 'darwin'} (*) 2025-09-07T10:21:29.8363956Z #43 0.637 DEBUG Selecting: cupy-cuda12x==13.6.0 [compatible] (cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8364281Z #43 0.637 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: cupy-cuda12x==13.6.0 2025-09-07T10:21:29.8364701Z #43 0.637 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: cupy-cuda12x{sys_platform != 'darwin'}==13.6.0 2025-09-07T10:21:29.8364954Z #43 0.637 DEBUG Searching for a compatible version of cupy-cuda12x (==13.6.0) 2025-09-07T10:21:29.8365415Z #43 0.637 DEBUG Selecting: cupy-cuda12x==13.6.0 [compatible] (cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8366306Z #43 0.637 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b1/ec/1fb891d8a2660716aadb2143235481d15ed1cbfe3ad669194690b0604492/pycountry-24.6.1-py3-none-any.whl.metadata 2025-09-07T10:21:29.8367559Z #43 0.637 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/14/8f/aa61f528fba38578ec553c145857a181384c72b98156f858ca5c8e82d9d3/websockets-15.0.1-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.8368517Z #43 0.637 DEBUG No cache entry for: https://files.pythonhosted.org/packages/e0/95/d7e1295141e7d530674a3cc567e13ed0eb6b81524cb122d797ed996b5bea/cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.8368783Z #43 0.637 DEBUG Found not-modified response for: https://pypi.org/simple/cffi/ 2025-09-07T10:21:29.8369024Z #43 0.637 DEBUG Found stale response for: https://pypi.org/simple/rpds-py/ 2025-09-07T10:21:29.8369325Z #43 0.637 DEBUG Sending revalidation request for: https://pypi.org/simple/rpds-py/ 2025-09-07T10:21:29.8370382Z #43 0.638 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b2/d5/da47df7004cb17e4955df6a43d14b3b4ae77737dff8bf7f8f333196717bf/cffi-1.17.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.8370760Z #43 0.638 DEBUG Found not-modified response for: https://pypi.org/simple/rpds-py/ 2025-09-07T10:21:29.8372042Z #43 0.642 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/ed/7b/8f4fee9ba1fb5ec856eb22d725a4efa3deb47f769597c809e03578b0f9d9/rpds_py-0.27.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.8372349Z #43 0.644 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: numpy>=1.22, <2.6 2025-09-07T10:21:29.8372674Z #43 0.644 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: fastrlock>=0.5 2025-09-07T10:21:29.8373060Z #43 0.644 DEBUG Searching for a compatible version of cupy-cuda12x{sys_platform != 'darwin'} (==13.6.0) 2025-09-07T10:21:29.8373489Z #43 0.644 DEBUG Selecting: cupy-cuda12x==13.6.0 [compatible] (cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8373792Z #43 0.644 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: numpy>=1.22, <2.6 2025-09-07T10:21:29.8374102Z #43 0.644 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: fastrlock>=0.5 2025-09-07T10:21:29.8374330Z #43 0.644 DEBUG Searching for a compatible version of sympy (>=1.13.3) 2025-09-07T10:21:29.8374804Z #43 0.644 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T10:21:29.8375047Z #43 0.644 DEBUG Selecting: sympy==1.14.0 [installed] (installed) 2025-09-07T10:21:29.8375334Z #43 0.644 DEBUG Adding transitive dependency for sympy==1.14.0: mpmath>=1.1.0, <1.4 2025-09-07T10:21:29.8375569Z #43 0.644 DEBUG Searching for a compatible version of networkx (>=2.5.1) 2025-09-07T10:21:29.8376065Z #43 0.644 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T10:21:29.8376266Z #43 0.644 DEBUG Selecting: networkx==3.5 [installed] (installed) 2025-09-07T10:21:29.8376503Z #43 0.644 DEBUG Searching for a compatible version of fsspec (>=2023.5.0) 2025-09-07T10:21:29.8377026Z #43 0.644 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=2023.5.0 2025-09-07T10:21:29.8377236Z #43 0.644 DEBUG Selecting: fsspec==2025.7.0 [installed] (installed) 2025-09-07T10:21:29.8377468Z #43 0.644 DEBUG No cache entry for: https://pypi.org/simple/fastrlock/ 2025-09-07T10:21:29.8378239Z #43 0.644 DEBUG Searching for a compatible version of hf-xet{platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (>=1.1.3, <2.0.0) 2025-09-07T10:21:29.8378721Z #43 0.644 DEBUG Selecting: hf-xet==1.1.9 [compatible] (hf_xet-1.1.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8378976Z #43 0.644 DEBUG Found stale response for: https://pypi.org/simple/mpmath/ 2025-09-07T10:21:29.8379231Z #43 0.644 DEBUG Adding transitive dependency for hf-xet==1.1.9: hf-xet==1.1.9 2025-09-07T10:21:29.8379533Z #43 0.644 DEBUG Sending revalidation request for: https://pypi.org/simple/mpmath/ 2025-09-07T10:21:29.8380287Z #43 0.644 DEBUG Adding transitive dependency for hf-xet==1.1.9: hf-xet{platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}==1.1.9 2025-09-07T10:21:29.8380515Z #43 0.644 DEBUG Searching for a compatible version of hf-xet (==1.1.9) 2025-09-07T10:21:29.8380962Z #43 0.644 DEBUG Selecting: hf-xet==1.1.9 [compatible] (hf_xet-1.1.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8382037Z #43 0.645 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/81/42/7e6955cf0621e87491a1fb8cad755d5c2517803cea174229b0ec00ff0166/hf_xet-1.1.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.8382784Z #43 0.645 DEBUG Searching for a compatible version of hf-xet{platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (==1.1.9) 2025-09-07T10:21:29.8383240Z #43 0.645 DEBUG Selecting: hf-xet==1.1.9 [compatible] (hf_xet-1.1.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8383577Z #43 0.645 DEBUG Searching for a compatible version of typer (>=0.15.1) 2025-09-07T10:21:29.8383850Z #43 0.645 DEBUG Selecting: typer==0.17.4 [compatible] (typer-0.17.4-py3-none-any.whl) 2025-09-07T10:21:29.8384107Z #43 0.645 DEBUG Adding transitive dependency for typer==0.17.4: click>=8.0.0 2025-09-07T10:21:29.8384444Z #43 0.645 DEBUG Adding transitive dependency for typer==0.17.4: typing-extensions>=3.7.4.3 2025-09-07T10:21:29.8384721Z #43 0.645 DEBUG Adding transitive dependency for typer==0.17.4: shellingham>=1.3.0 2025-09-07T10:21:29.8384975Z #43 0.645 DEBUG Adding transitive dependency for typer==0.17.4: rich>=10.11.0 2025-09-07T10:21:29.8385222Z #43 0.645 DEBUG Searching for a compatible version of rich-toolkit (>=0.14.8) 2025-09-07T10:21:29.8385560Z #43 0.645 DEBUG Selecting: rich-toolkit==0.15.1 [compatible] (rich_toolkit-0.15.1-py3-none-any.whl) 2025-09-07T10:21:29.8385851Z #43 0.645 DEBUG Adding transitive dependency for rich-toolkit==0.15.1: click>=8.1.7 2025-09-07T10:21:29.8386127Z #43 0.645 DEBUG Adding transitive dependency for rich-toolkit==0.15.1: rich>=13.7.1 2025-09-07T10:21:29.8386463Z #43 0.645 DEBUG Adding transitive dependency for rich-toolkit==0.15.1: typing-extensions>=4.12.2 2025-09-07T10:21:29.8386770Z #43 0.645 DEBUG Searching for a compatible version of fastapi-cloud-cli (>=0.1.1) 2025-09-07T10:21:29.8387148Z #43 0.645 DEBUG Selecting: fastapi-cloud-cli==0.1.5 [compatible] (fastapi_cloud_cli-0.1.5-py3-none-any.whl) 2025-09-07T10:21:29.8387451Z #43 0.645 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: typer>=0.12.3 2025-09-07T10:21:29.8387817Z #43 0.645 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: uvicorn[standard]>=0.15.0 2025-09-07T10:21:29.8388120Z #43 0.645 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: rignore>=0.5.1 2025-09-07T10:21:29.8388417Z #43 0.645 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: httpx>=0.27.0 2025-09-07T10:21:29.8388761Z #43 0.645 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: rich-toolkit>=0.14.5 2025-09-07T10:21:29.8389101Z #43 0.645 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: pydantic[email]>=1.6.1 2025-09-07T10:21:29.8389425Z #43 0.645 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: sentry-sdk>=2.20.0 2025-09-07T10:21:29.8389687Z #43 0.645 DEBUG Searching for a compatible version of pydantic[email] (>=1.6.1) 2025-09-07T10:21:29.8390030Z #43 0.645 DEBUG Selecting: pydantic==2.11.7 [compatible] (pydantic-2.11.7-py3-none-any.whl) 2025-09-07T10:21:29.8390308Z #43 0.645 DEBUG Adding transitive dependency for pydantic==2.11.7: pydantic==2.11.7 2025-09-07T10:21:29.8390611Z #43 0.645 DEBUG Adding transitive dependency for pydantic==2.11.7: pydantic[email]==2.11.7 2025-09-07T10:21:29.8390887Z #43 0.645 DEBUG Found stale response for: https://pypi.org/simple/shellingham/ 2025-09-07T10:21:29.8391150Z #43 0.645 DEBUG Searching for a compatible version of pydantic[email] (==2.11.7) 2025-09-07T10:21:29.8391453Z #43 0.645 DEBUG Sending revalidation request for: https://pypi.org/simple/shellingham/ 2025-09-07T10:21:29.8391764Z #43 0.645 DEBUG Selecting: pydantic==2.11.7 [compatible] (pydantic-2.11.7-py3-none-any.whl) 2025-09-07T10:21:29.8392067Z #43 0.645 DEBUG Adding transitive dependency for pydantic==2.11.7: email-validator>=2.0.0 2025-09-07T10:21:29.8392335Z #43 0.645 DEBUG Searching for a compatible version of httpcore (>=1.dev0, <2.dev0) 2025-09-07T10:21:29.8392643Z #43 0.645 DEBUG Selecting: httpcore==1.0.9 [compatible] (httpcore-1.0.9-py3-none-any.whl) 2025-09-07T10:21:29.8392877Z #43 0.645 DEBUG Adding transitive dependency for httpcore==1.0.9: certifi* 2025-09-07T10:21:29.8393113Z #43 0.645 DEBUG Adding transitive dependency for httpcore==1.0.9: h11>=0.16 2025-09-07T10:21:29.8393357Z #43 0.645 DEBUG Searching for a compatible version of markupsafe (>=2.0) 2025-09-07T10:21:29.8394072Z #43 0.645 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T10:21:29.8394284Z #43 0.645 DEBUG Selecting: markupsafe==3.0.2 [installed] (installed) 2025-09-07T10:21:29.8394529Z #43 0.645 DEBUG Searching for a compatible version of dnspython (>=2.0.0) 2025-09-07T10:21:29.8394870Z #43 0.645 DEBUG Selecting: dnspython==2.7.0 [compatible] (dnspython-2.7.0-py3-none-any.whl) 2025-09-07T10:21:29.8395074Z #43 0.645 DEBUG Searching for a compatible version of h11 (>=0.16) 2025-09-07T10:21:29.8395322Z #43 0.645 DEBUG Selecting: h11==0.16.0 [compatible] (h11-0.16.0-py3-none-any.whl) 2025-09-07T10:21:29.8395567Z #43 0.645 DEBUG Searching for a compatible version of httptools (>=0.6.3) 2025-09-07T10:21:29.8396220Z #43 0.645 DEBUG Selecting: httptools==0.6.4 [compatible] (httptools-0.6.4-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8396470Z #43 0.645 DEBUG Searching for a compatible version of python-dotenv (>=0.13) 2025-09-07T10:21:29.8396823Z #43 0.645 DEBUG Selecting: python-dotenv==1.1.1 [compatible] (python_dotenv-1.1.1-py3-none-any.whl) 2025-09-07T10:21:29.8397083Z #43 0.645 DEBUG Found stale response for: https://pypi.org/simple/sentry-sdk/ 2025-09-07T10:21:29.8397725Z #43 0.645 DEBUG Searching for a compatible version of uvloop{platform_python_implementation != 'PyPy' and sys_platform != 'cygwin' and sys_platform != 'win32'} (>=0.15.1) 2025-09-07T10:21:29.8398078Z #43 0.645 DEBUG Sending revalidation request for: https://pypi.org/simple/sentry-sdk/ 2025-09-07T10:21:29.8398309Z #43 0.645 DEBUG Found stale response for: https://pypi.org/simple/rich/ 2025-09-07T10:21:29.8398579Z #43 0.645 DEBUG Sending revalidation request for: https://pypi.org/simple/rich/ 2025-09-07T10:21:29.8399050Z #43 0.645 DEBUG Selecting: uvloop==0.21.0 [compatible] (uvloop-0.21.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8399312Z #43 0.645 DEBUG Adding transitive dependency for uvloop==0.21.0: uvloop==0.21.0 2025-09-07T10:21:29.8399976Z #43 0.645 DEBUG Adding transitive dependency for uvloop==0.21.0: uvloop{platform_python_implementation != 'PyPy' and sys_platform != 'cygwin' and sys_platform != 'win32'}==0.21.0 2025-09-07T10:21:29.8400214Z #43 0.645 DEBUG Searching for a compatible version of uvloop (==0.21.0) 2025-09-07T10:21:29.8400670Z #43 0.645 DEBUG Selecting: uvloop==0.21.0 [compatible] (uvloop-0.21.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8401769Z #43 0.645 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/06/a7/b4e6a19925c900be9f98bec0a75e6e8f79bb53bdeb891916609ab3958967/uvloop-0.21.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.8402427Z #43 0.645 DEBUG Searching for a compatible version of uvloop{platform_python_implementation != 'PyPy' and sys_platform != 'cygwin' and sys_platform != 'win32'} (==0.21.0) 2025-09-07T10:21:29.8402878Z #43 0.645 DEBUG Selecting: uvloop==0.21.0 [compatible] (uvloop-0.21.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8403112Z #43 0.645 DEBUG Searching for a compatible version of websockets (>=10.4) 2025-09-07T10:21:29.8403796Z #43 0.645 DEBUG Selecting: websockets==15.0.1 [compatible] (websockets-15.0.1-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8404119Z #43 0.645 DEBUG Searching for a compatible version of jsonschema-specifications (>=2023.3.6) 2025-09-07T10:21:29.8404625Z #43 0.645 DEBUG Selecting: jsonschema-specifications==2025.4.1 [compatible] (jsonschema_specifications-2025.4.1-py3-none-any.whl) 2025-09-07T10:21:29.8405032Z #43 0.645 DEBUG Adding transitive dependency for jsonschema-specifications==2025.4.1: referencing>=0.31.0 2025-09-07T10:21:29.8405278Z #43 0.645 DEBUG Searching for a compatible version of referencing (>=0.31.0) 2025-09-07T10:21:29.8405639Z #43 0.645 DEBUG Selecting: referencing==0.36.2 [compatible] (referencing-0.36.2-py3-none-any.whl) 2025-09-07T10:21:29.8405935Z #43 0.646 DEBUG Adding transitive dependency for referencing==0.36.2: attrs>=22.2.0 2025-09-07T10:21:29.8406218Z #43 0.646 DEBUG Adding transitive dependency for referencing==0.36.2: rpds-py>=0.7.0 2025-09-07T10:21:29.8406672Z #43 0.646 DEBUG Adding transitive dependency for referencing==0.36.2: typing-extensions{python_full_version < '3.13'}>=4.4.0 2025-09-07T10:21:29.8406959Z #43 0.646 DEBUG Searching for a compatible version of rpds-py (>=0.7.1) 2025-09-07T10:21:29.8407410Z #43 0.646 DEBUG Selecting: rpds-py==0.27.1 [compatible] (rpds_py-0.27.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8407635Z #43 0.646 DEBUG Searching for a compatible version of pycountry (>=23) 2025-09-07T10:21:29.8407965Z #43 0.646 DEBUG Selecting: pycountry==24.6.1 [compatible] (pycountry-24.6.1-py3-none-any.whl) 2025-09-07T10:21:29.8408176Z #43 0.646 DEBUG Searching for a compatible version of cffi (>=1.0) 2025-09-07T10:21:29.8408600Z #43 0.646 DEBUG Selecting: cffi==1.17.1 [compatible] (cffi-1.17.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8408838Z #43 0.646 DEBUG Adding transitive dependency for cffi==1.17.1: pycparser* 2025-09-07T10:21:29.8409097Z #43 0.647 DEBUG Found stale response for: https://pypi.org/simple/rignore/ 2025-09-07T10:21:29.8409446Z #43 0.647 DEBUG Sending revalidation request for: https://pypi.org/simple/rignore/ 2025-09-07T10:21:29.8409715Z #43 0.647 DEBUG Found not-modified response for: https://pypi.org/simple/mpmath/ 2025-09-07T10:21:29.8410027Z #43 0.647 DEBUG Found not-modified response for: https://pypi.org/simple/shellingham/ 2025-09-07T10:21:29.8410318Z #43 0.647 DEBUG Found not-modified response for: https://pypi.org/simple/sentry-sdk/ 2025-09-07T10:21:29.8410573Z #43 0.648 DEBUG Found not-modified response for: https://pypi.org/simple/rich/ 2025-09-07T10:21:29.8410961Z #43 0.648 DEBUG Found stale response for: https://pypi.org/simple/pycparser/ 2025-09-07T10:21:29.8411434Z #43 0.648 DEBUG Sending revalidation request for: https://pypi.org/simple/pycparser/ 2025-09-07T10:21:29.8411666Z #43 0.648 DEBUG Searching for a compatible version of fastrlock (>=0.5) 2025-09-07T10:21:29.8412264Z #43 0.648 DEBUG Selecting: fastrlock==0.8.3 [compatible] (fastrlock-0.8.3-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.8412762Z #43 0.648 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T10:21:29.8413750Z #43 0.648 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e0/f9/0595336914c5619e5f28a1fb793285925a8cd4b432c9da0a987836c7f822/shellingham-1.5.4-py2.py3-none-any.whl.metadata 2025-09-07T10:21:29.8414922Z #43 0.648 DEBUG No cache entry for: https://files.pythonhosted.org/packages/80/07/cdecb7aa976f34328372f1c4efd6c9dc1b039b3cc8d3f38787d640009a25/fastrlock-0.8.3-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T10:21:29.8415873Z #43 0.648 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/07/d5/f9f4a2bf5db2ca8f692c46f3821fee1f302f1b76a0e2914aee5390fca565/sentry_sdk-2.37.0-py2.py3-none-any.whl.metadata 2025-09-07T10:21:29.8416739Z #43 0.648 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e3/30/3c4d035596d3cf444529e0b2953ad0466f6049528a879d27534700580395/rich-14.1.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.8417040Z #43 0.649 DEBUG Found not-modified response for: https://pypi.org/simple/rignore/ 2025-09-07T10:21:29.8417340Z #43 0.650 DEBUG Found not-modified response for: https://pypi.org/simple/pycparser/ 2025-09-07T10:21:29.8418444Z #43 0.650 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/80/c8/b91afda10bd5ca1e3a80463340b899c0dc26a7750a9f3c94f668585c7f40/rignore-0.6.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T10:21:29.8419393Z #43 0.650 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/13/a3/a812df4e2dd5696d1f351d58b8fe16a405b234ad2886a0dab9183fb78109/pycparser-2.22-py3-none-any.whl.metadata 2025-09-07T10:21:29.8419642Z #43 0.658 DEBUG Searching for a compatible version of mpmath (>=1.1.0, <1.4) 2025-09-07T10:21:29.8420135Z #43 0.658 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T10:21:29.8420379Z #43 0.658 DEBUG Selecting: mpmath==1.3.0 [installed] (installed) 2025-09-07T10:21:29.8420629Z #43 0.658 DEBUG Searching for a compatible version of shellingham (>=1.3.0) 2025-09-07T10:21:29.8420990Z #43 0.658 DEBUG Selecting: shellingham==1.5.4 [compatible] (shellingham-1.5.4-py2.py3-none-any.whl) 2025-09-07T10:21:29.8421224Z #43 0.658 DEBUG Searching for a compatible version of rich (>=13.7.1) 2025-09-07T10:21:29.8421491Z #43 0.658 DEBUG Selecting: rich==14.1.0 [compatible] (rich-14.1.0-py3-none-any.whl) 2025-09-07T10:21:29.8421784Z #43 0.658 DEBUG Adding transitive dependency for rich==14.1.0: markdown-it-py>=2.2.0 2025-09-07T10:21:29.8422090Z #43 0.658 DEBUG Adding transitive dependency for rich==14.1.0: pygments>=2.13.0, <3.0.0 2025-09-07T10:21:29.8422317Z #43 0.658 DEBUG Searching for a compatible version of rignore (>=0.5.1) 2025-09-07T10:21:29.8422931Z #43 0.658 DEBUG Selecting: rignore==0.6.4 [compatible] (rignore-0.6.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8423215Z #43 0.658 DEBUG Searching for a compatible version of sentry-sdk (>=2.20.0) 2025-09-07T10:21:29.8423546Z #43 0.658 DEBUG Selecting: sentry-sdk==2.37.0 [compatible] (sentry_sdk-2.37.0-py2.py3-none-any.whl) 2025-09-07T10:21:29.8423826Z #43 0.658 DEBUG Adding transitive dependency for sentry-sdk==2.37.0: urllib3>=1.26.11 2025-09-07T10:21:29.8424083Z #43 0.658 DEBUG Adding transitive dependency for sentry-sdk==2.37.0: certifi* 2025-09-07T10:21:29.8424294Z #43 0.658 DEBUG Searching for a compatible version of pycparser (*) 2025-09-07T10:21:29.8424598Z #43 0.658 DEBUG Selecting: pycparser==2.22 [compatible] (pycparser-2.22-py3-none-any.whl) 2025-09-07T10:21:29.8424846Z #43 0.658 DEBUG Found stale response for: https://pypi.org/simple/pygments/ 2025-09-07T10:21:29.8425153Z #43 0.658 DEBUG Sending revalidation request for: https://pypi.org/simple/pygments/ 2025-09-07T10:21:29.8425426Z #43 0.658 DEBUG Found stale response for: https://pypi.org/simple/markdown-it-py/ 2025-09-07T10:21:29.8425751Z #43 0.658 DEBUG Sending revalidation request for: https://pypi.org/simple/markdown-it-py/ 2025-09-07T10:21:29.8426101Z #43 0.659 DEBUG Found not-modified response for: https://pypi.org/simple/markdown-it-py/ 2025-09-07T10:21:29.8426360Z #43 0.659 DEBUG Searching for a compatible version of markdown-it-py (>=2.2.0) 2025-09-07T10:21:29.8426642Z #43 0.659 DEBUG Found not-modified response for: https://pypi.org/simple/pygments/ 2025-09-07T10:21:29.8426995Z #43 0.659 DEBUG Selecting: markdown-it-py==4.0.0 [compatible] (markdown_it_py-4.0.0-py3-none-any.whl) 2025-09-07T10:21:29.8427906Z #43 0.660 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/94/54/e7d793b573f298e1c9013b8c4dade17d481164aa517d1d7148619c2cedbf/markdown_it_py-4.0.0-py3-none-any.whl.metadata 2025-09-07T10:21:29.8428217Z #43 0.660 DEBUG Adding transitive dependency for markdown-it-py==4.0.0: mdurl>=0.1, <1.dev0 2025-09-07T10:21:29.8428488Z #43 0.660 DEBUG Searching for a compatible version of pygments (>=2.13.0, <3.0.0) 2025-09-07T10:21:29.8428793Z #43 0.660 DEBUG Selecting: pygments==2.19.2 [compatible] (pygments-2.19.2-py3-none-any.whl) 2025-09-07T10:21:29.8429669Z #43 0.660 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl.metadata 2025-09-07T10:21:29.8429914Z #43 0.660 DEBUG Found stale response for: https://pypi.org/simple/mdurl/ 2025-09-07T10:21:29.8430187Z #43 0.660 DEBUG Sending revalidation request for: https://pypi.org/simple/mdurl/ 2025-09-07T10:21:29.8430482Z #43 0.661 DEBUG Found not-modified response for: https://pypi.org/simple/mdurl/ 2025-09-07T10:21:29.8430732Z #43 0.661 DEBUG Searching for a compatible version of mdurl (>=0.1, <1.dev0) 2025-09-07T10:21:29.8430996Z #43 0.661 DEBUG Selecting: mdurl==0.1.2 [compatible] (mdurl-0.1.2-py3-none-any.whl) 2025-09-07T10:21:29.8431861Z #43 0.661 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b3/38/89ba8ad64ae25be8de66a6d463314cf1eb366222074cfda9ee839c56a4b4/mdurl-0.1.2-py3-none-any.whl.metadata 2025-09-07T10:21:29.8440332Z #43 0.661 DEBUG Tried 140 versions: aiohappyeyeballs 1, aiohttp 1, aiosignal 1, annotated-types 1, anyio 1, astor 1, attrs 1, blake3 1, cachetools 1, cbor2 1, certifi 1, cffi 1, charset-normalizer 1, click 1, cloudpickle 1, compressed-tensors 1, cupy-cuda12x 1, depyf 1, dill 1, diskcache 1, distro 1, dnspython 1, einops 1, email-validator 1, fastapi 1, fastapi-cli 1, fastapi-cloud-cli 1, fastrlock 1, filelock 1, frozendict 1, frozenlist 1, fsspec 1, gguf 1, h11 1, hf-xet 1, httpcore 1, httptools 1, httpx 1, huggingface-hub 1, idna 1, interegular 1, jinja2 1, jiter 1, jsonschema 1, jsonschema-specifications 1, lark 1, llguidance 1, llvmlite 1, lm-format-enforcer 1, markdown-it-py 1, markupsafe 1, mdurl 1, mistral-common 1, mpmath 1, msgpack 1, msgspec 1, multidict 1, networkx 1, ninja 1, numba 1, numpy 1, nvidia-cublas-cu12 1, nvidia-cuda-cupti-cu12 1, nvidia-cuda-nvrtc-cu12 1, nvidia-cuda-runtime-cu12 1, nvidia-cudnn-cu12 1, nvidia-cufft-cu12 1, nvidia-cufile-cu12 1, nvidia-curand-cu12 1, nvidia-cusolver-cu12 1, nvidia-cusparse-cu12 1, nvidia-cusparselt-cu12 1, nvidia-nccl-cu12 1, nvidia-nvjitlink-cu12 1, nvidia-nvshmem-cu12 1, nvidia-nvtx-cu12 1, openai 1, openai-harmony 1, opencv-python-headless 1, outlines-core 1, packaging 1, partial-json-parser 1, pillow 1, prometheus-client 1, prometheus-fastapi-instrumentator 1, propcache 1, protobuf 1, psutil 1, py-cpuinfo 1, pybase64 1, pycountry 1, pycparser 1, pydantic 1, pydantic-core 1, pydantic-extra-types 1, pygments 1, python-dotenv 1, python-json-logger 1, python-multipart 1, pytorch-triton 1, pyyaml 1, pyzmq 1, ray 1, referencing 1, regex 1, requests 1, rich 1, rich-toolkit 1, rignore 1, rpds-py 1, safetensors 1, scipy 1, sentencepiece 1, sentry-sdk 1, setproctitle 1, setuptools 1, shellingham 1, six 1, sniffio 1, soundfile 1, soxr 1, starlette 1, sympy 1, tiktoken 1, tokenizers 1, torch 1, tqdm 1, transformers 1, triton 1, typer 1, typing-extensions 1, typing-inspection 1, urllib3 1, uvicorn 1, uvloop 1, vllm 1, watchfiles 1, websockets 1, xgrammar 1, yarl 1 2025-09-07T10:21:29.8440588Z #43 0.661 DEBUG marker environment resolution took 0.213s 2025-09-07T10:21:29.8440715Z #43 0.663 Resolved 140 packages in 225ms 2025-09-07T10:21:29.8441293Z #43 0.663 DEBUG Requirement already installed: pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.8441531Z #43 0.664 DEBUG Registry requirement already cached: propcache==0.3.2 2025-09-07T10:21:29.8441757Z #43 0.664 DEBUG Registry requirement already cached: frozenlist==1.7.0 2025-09-07T10:21:29.8441998Z #43 0.664 DEBUG Registry requirement already cached: pydantic-core==2.33.2 2025-09-07T10:21:29.8442227Z #43 0.664 DEBUG Registry requirement already cached: fastapi==0.116.1 2025-09-07T10:21:29.8442435Z #43 0.664 DEBUG Identified uncached distribution: fastrlock==0.8.3 2025-09-07T10:21:29.8442657Z #43 0.664 DEBUG Registry requirement already cached: pycparser==2.22 2025-09-07T10:21:29.8442875Z #43 0.664 DEBUG Registry requirement already cached: typer==0.17.4 2025-09-07T10:21:29.8443067Z #43 0.664 DEBUG Requirement already installed: packaging==25.0 2025-09-07T10:21:29.8443290Z #43 0.664 DEBUG Registry requirement already cached: cachetools==6.2.0 2025-09-07T10:21:29.8443550Z #43 0.664 DEBUG Registry requirement already cached: annotated-types==0.7.0 2025-09-07T10:21:29.8444364Z #43 0.664 DEBUG Requirement already installed: pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.8444995Z #43 0.664 DEBUG Requirement already installed: nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) 2025-09-07T10:21:29.8445216Z #43 0.664 DEBUG Registry requirement already cached: pyzmq==27.0.2 2025-09-07T10:21:29.8445446Z #43 0.664 DEBUG Registry requirement already cached: numba==0.61.2 2025-09-07T10:21:29.8445668Z #43 0.664 DEBUG Registry requirement already cached: watchfiles==1.1.0 2025-09-07T10:21:29.8445898Z #43 0.664 DEBUG Registry requirement already cached: safetensors==0.6.2 2025-09-07T10:21:29.8446132Z #43 0.664 DEBUG Registry requirement already cached: starlette==0.47.3 2025-09-07T10:21:29.8446843Z #43 0.664 DEBUG Requirement already installed: nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.8447467Z #43 0.664 DEBUG Requirement already installed: nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8447683Z #43 0.664 DEBUG Registry requirement already cached: ninja==1.13.0 2025-09-07T10:21:29.8448346Z #43 0.664 DEBUG Requirement already installed: nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.8448593Z #43 0.664 DEBUG Registry requirement already cached: httpcore==1.0.9 2025-09-07T10:21:29.8449386Z #43 0.664 DEBUG Requirement already installed: nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:21:29.8449824Z #43 0.664 DEBUG Registry requirement already cached: aiohappyeyeballs==2.6.1 2025-09-07T10:21:29.8450047Z #43 0.664 DEBUG Registry requirement already cached: openai==1.106.1 2025-09-07T10:21:29.8450332Z #43 0.664 DEBUG Registry requirement already cached: charset-normalizer==3.4.3 2025-09-07T10:21:29.8450572Z #43 0.664 DEBUG Registry requirement already cached: referencing==0.36.2 2025-09-07T10:21:29.8450850Z #43 0.664 DEBUG Registry requirement already cached: uvicorn==0.35.0 2025-09-07T10:21:29.8451074Z #43 0.664 DEBUG Registry requirement already cached: hf-xet==1.1.9 2025-09-07T10:21:29.8451808Z #43 0.664 DEBUG Requirement already installed: nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T10:21:29.8452033Z #43 0.664 DEBUG Registry requirement already cached: pybase64==1.4.2 2025-09-07T10:21:29.8452752Z #43 0.664 DEBUG Requirement already installed: nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) 2025-09-07T10:21:29.8452979Z #43 0.664 DEBUG Registry requirement already cached: tiktoken==0.11.0 2025-09-07T10:21:29.8453202Z #43 0.664 DEBUG Registry requirement already cached: aiosignal==1.4.0 2025-09-07T10:21:29.8453441Z #43 0.664 DEBUG Registry requirement already cached: aiohttp==3.12.15 2025-09-07T10:21:29.8453646Z #43 0.664 DEBUG Registry requirement already cached: click==8.2.1 2025-09-07T10:21:29.8453883Z #43 0.665 DEBUG Registry requirement already cached: sentry-sdk==2.37.0 2025-09-07T10:21:29.8454091Z #43 0.665 DEBUG Registry requirement already cached: distro==1.9.0 2025-09-07T10:21:29.8454342Z #43 0.665 DEBUG Registry requirement already cached: tokenizers==0.22.0 2025-09-07T10:21:29.8454963Z #43 0.665 DEBUG Requirement already installed: nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:21:29.8455234Z #43 0.665 DEBUG Registry requirement already cached: typing-inspection==0.4.1 2025-09-07T10:21:29.8455454Z #43 0.665 DEBUG Registry requirement already cached: cbor2==5.7.0 2025-09-07T10:21:29.8455680Z #43 0.665 DEBUG Registry requirement already cached: certifi==2025.8.3 2025-09-07T10:21:29.8455966Z #43 0.665 DEBUG Registry requirement already cached: py-cpuinfo==9.0.0 2025-09-07T10:21:29.8456203Z #43 0.665 DEBUG Registry requirement already cached: requests==2.32.5 2025-09-07T10:21:29.8456413Z #43 0.665 DEBUG Registry requirement already cached: urllib3==2.5.0 2025-09-07T10:21:29.8456628Z #43 0.665 DEBUG Registry requirement already cached: msgspec==0.19.0 2025-09-07T10:21:29.8456847Z #43 0.665 DEBUG Registry requirement already cached: tqdm==4.67.1 2025-09-07T10:21:29.8457092Z #43 0.665 DEBUG Registry requirement already cached: yarl==1.20.1 2025-09-07T10:21:29.8457344Z #43 0.665 DEBUG Registry requirement already cached: openai-harmony==0.0.4 2025-09-07T10:21:29.8457557Z #43 0.665 DEBUG Registry requirement already cached: h11==0.16.0 2025-09-07T10:21:29.8457790Z #43 0.665 DEBUG Registry requirement already cached: shellingham==1.5.4 2025-09-07T10:21:29.8458546Z #43 0.665 DEBUG Requirement already installed: nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.8458856Z #43 0.665 DEBUG Registry requirement already cached: opencv-python-headless==4.12.0.88 2025-09-07T10:21:29.8459388Z #43 0.665 DEBUG Requirement already installed: typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T10:21:29.8459637Z #43 0.665 DEBUG Registry requirement already cached: transformers==4.56.1 2025-09-07T10:21:29.8460086Z #43 0.665 DEBUG Requirement already installed: mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T10:21:29.8460306Z #43 0.665 DEBUG Registry requirement already cached: astor==0.8.1 2025-09-07T10:21:29.8460581Z #43 0.665 DEBUG Registry requirement already cached: lm-format-enforcer==0.11.3 2025-09-07T10:21:29.8460803Z #43 0.665 DEBUG Registry requirement already cached: pygments==2.19.2 2025-09-07T10:21:29.8461063Z #43 0.665 DEBUG Registry requirement already cached: outlines-core==0.2.10 2025-09-07T10:21:29.8461276Z #43 0.665 DEBUG Registry requirement already cached: anyio==4.10.0 2025-09-07T10:21:29.8461512Z #43 0.665 DEBUG Registry requirement already cached: interegular==0.3.3 2025-09-07T10:21:29.8461736Z #43 0.665 DEBUG Registry requirement already cached: blake3==1.0.5 2025-09-07T10:21:29.8462003Z #43 0.665 DEBUG Registry requirement already cached: fastapi-cloud-cli==0.1.5 2025-09-07T10:21:29.8462516Z #43 0.665 DEBUG Requirement already installed: jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T10:21:29.8462752Z #43 0.665 DEBUG Registry requirement already cached: protobuf==6.32.0 2025-09-07T10:21:29.8462985Z #43 0.665 DEBUG Registry requirement already cached: scipy==1.16.1 2025-09-07T10:21:29.8463189Z #43 0.665 DEBUG Registry requirement already cached: pyyaml==6.0.2 2025-09-07T10:21:29.8463391Z #43 0.665 DEBUG Registry requirement already cached: mdurl==0.1.2 2025-09-07T10:21:29.8463623Z #43 0.665 DEBUG Registry requirement already cached: xgrammar==0.1.23 2025-09-07T10:21:29.8463848Z #43 0.666 DEBUG Identified uncached distribution: cupy-cuda12x==13.6.0 2025-09-07T10:21:29.8464090Z #43 0.666 DEBUG Registry requirement already cached: markdown-it-py==4.0.0 2025-09-07T10:21:29.8464314Z #43 0.666 DEBUG Registry requirement already cached: rpds-py==0.27.1 2025-09-07T10:21:29.8464529Z #43 0.666 DEBUG Registry requirement already cached: multidict==6.6.4 2025-09-07T10:21:29.8464833Z #43 0.666 DEBUG Registry requirement already cached: partial-json-parser==0.2.1.1.post6 2025-09-07T10:21:29.8465241Z #43 0.666 DEBUG Requirement already installed: networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T10:21:29.8465470Z #43 0.666 DEBUG Registry requirement already cached: websockets==15.0.1 2025-09-07T10:21:29.8465710Z #43 0.666 DEBUG Registry requirement already cached: python-dotenv==1.1.1 2025-09-07T10:21:29.8466133Z #43 0.666 DEBUG Requirement already installed: fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T10:21:29.8466374Z #43 0.666 DEBUG Registry requirement already cached: sniffio==1.3.1 2025-09-07T10:21:29.8466594Z #43 0.666 DEBUG Registry requirement already cached: httptools==0.6.4 2025-09-07T10:21:29.8466854Z #43 0.666 DEBUG Registry requirement already cached: prometheus-client==0.22.1 2025-09-07T10:21:29.8467064Z #43 0.666 DEBUG Registry requirement already cached: lark==1.2.2 2025-09-07T10:21:29.8467657Z #43 0.666 DEBUG Requirement already installed: nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:21:29.8468033Z #43 0.666 DEBUG Registry requirement already cached: prometheus-fastapi-instrumentator==7.1.0 2025-09-07T10:21:29.8468288Z #43 0.666 DEBUG Registry requirement already cached: sentencepiece==0.2.1 2025-09-07T10:21:29.8468483Z #43 0.666 DEBUG Registry requirement already cached: dill==0.4.0 2025-09-07T10:21:29.8468744Z #43 0.666 DEBUG Registry requirement already cached: python-json-logger==3.3.0 2025-09-07T10:21:29.8468980Z #43 0.666 DEBUG Registry requirement already cached: frozendict==2.4.6 2025-09-07T10:21:29.8469197Z #43 0.666 DEBUG Registry requirement already cached: pydantic==2.11.7 2025-09-07T10:21:29.8469401Z #43 0.666 DEBUG Registry requirement already cached: einops==0.8.1 2025-09-07T10:21:29.8470053Z #43 0.666 DEBUG Requirement already installed: nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.8470328Z #43 0.666 DEBUG Registry requirement already cached: fastapi-cli==0.0.10 2025-09-07T10:21:29.8470570Z #43 0.666 DEBUG Registry requirement already cached: mistral-common==1.8.4 2025-09-07T10:21:29.8471202Z #43 0.666 DEBUG Requirement already installed: markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:21:29.8471408Z #43 0.666 DEBUG Registry requirement already cached: httpx==0.28.1 2025-09-07T10:21:29.8472087Z #43 0.666 DEBUG Requirement already installed: nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.8472303Z #43 0.666 DEBUG Registry requirement already cached: rich==14.1.0 2025-09-07T10:21:29.8472502Z #43 0.666 DEBUG Registry requirement already cached: cffi==1.17.1 2025-09-07T10:21:29.8472753Z #43 0.666 DEBUG Registry requirement already cached: email-validator==2.3.0 2025-09-07T10:21:29.8473160Z #43 0.666 DEBUG Requirement already installed: sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T10:21:29.8473368Z #43 0.666 DEBUG Registry requirement already cached: rignore==0.6.4 2025-09-07T10:21:29.8473600Z #43 0.666 DEBUG Registry requirement already cached: jiter==0.10.0 2025-09-07T10:21:29.8473879Z #43 0.666 DEBUG Registry requirement already cached: pydantic-extra-types==2.10.5 2025-09-07T10:21:29.8474121Z #43 0.666 DEBUG Registry requirement already cached: llguidance==0.7.30 2025-09-07T10:21:29.8474323Z #43 0.667 DEBUG Registry requirement already cached: triton==3.4.0 2025-09-07T10:21:29.8474528Z #43 0.667 DEBUG Registry requirement already cached: psutil==7.0.0 2025-09-07T10:21:29.8474751Z #43 0.667 DEBUG Registry requirement already cached: uvloop==0.21.0 2025-09-07T10:21:29.8474962Z #43 0.667 DEBUG Registry requirement already cached: regex==2025.9.1 2025-09-07T10:21:29.8475149Z #43 0.667 DEBUG Identified uncached distribution: ray==2.49.1 2025-09-07T10:21:29.8475366Z #43 0.667 DEBUG Registry requirement already cached: gguf==0.17.1 2025-09-07T10:21:29.8475586Z #43 0.667 DEBUG Registry requirement already cached: soxr==0.5.0.post1 2025-09-07T10:21:29.8475783Z #43 0.667 DEBUG Registry requirement already cached: idna==3.10 2025-09-07T10:21:29.8476108Z #43 0.667 DEBUG Registry requirement already cached: jsonschema-specifications==2025.4.1 2025-09-07T10:21:29.8476330Z #43 0.667 DEBUG Registry requirement already cached: diskcache==5.6.3 2025-09-07T10:21:29.8476984Z #43 0.667 DEBUG Requirement already installed: nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:21:29.8477195Z #43 0.667 DEBUG Registry requirement already cached: depyf==0.19.0 2025-09-07T10:21:29.8477439Z #43 0.667 DEBUG Registry requirement already cached: cloudpickle==3.1.1 2025-09-07T10:21:29.8477655Z #43 0.667 DEBUG Registry requirement already cached: llvmlite==0.44.0 2025-09-07T10:21:29.8477928Z #43 0.667 DEBUG Registry requirement already cached: compressed-tensors==0.11.0 2025-09-07T10:21:29.8478649Z #43 0.667 DEBUG Requirement already installed: nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:21:29.8479091Z #43 0.667 DEBUG Requirement already installed: setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T10:21:29.8479312Z #43 0.667 DEBUG Registry requirement already cached: dnspython==2.7.0 2025-09-07T10:21:29.8479552Z #43 0.667 DEBUG Registry requirement already cached: jsonschema==4.25.1 2025-09-07T10:21:29.8479788Z #43 0.667 DEBUG Registry requirement already cached: setproctitle==1.3.7 2025-09-07T10:21:29.8480040Z #43 0.667 DEBUG Registry requirement already cached: huggingface-hub==0.34.4 2025-09-07T10:21:29.8480220Z #43 0.669 DEBUG Requirement installed, but mismatched: 2025-09-07T10:21:29.8483424Z #43 0.669 Installed: Url(InstalledDirectUrlDist { name: PackageName("numpy"), version: "2.3.2", direct_url: ArchiveUrl { url: "file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", archive_info: ArchiveInfo { hash: None, hashes: None }, subdirectory: None }, url: DisplaySafeUrl { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", query: None, fragment: None }, editable: false, path: "/opt/python/cp312-cp312/lib/python3.12/site-packages/numpy-2.3.2.dist-info", cache_info: Some(CacheInfo { timestamp: Some(Timestamp(SystemTime { tv_sec: 1757226074, tv_nsec: 843199271 })), commit: None, tags: None, env: {}, directories: {} }) }) 2025-09-07T10:21:29.8485055Z #43 0.669 Requested: Registry { specifier: VersionSpecifiers([VersionSpecifier { operator: Equal, version: "2.2.6" }]), index: Some(IndexMetadata { url: Pypi(VerbatimUrl { url: DisplaySafeUrl { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("pypi.org")), port: None, path: "/simple", query: None, fragment: None }, given: None }), format: Simple }), conflict: None } 2025-09-07T10:21:29.8485269Z #43 0.669 DEBUG Registry requirement already cached: numpy==2.2.6 2025-09-07T10:21:29.8485499Z #43 0.669 DEBUG Identified uncached distribution: msgpack==1.1.1 2025-09-07T10:21:29.8485720Z #43 0.669 DEBUG Registry requirement already cached: attrs==25.3.0 2025-09-07T10:21:29.8486436Z #43 0.669 DEBUG Requirement already installed: nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T10:21:29.8486635Z #43 0.670 DEBUG Registry requirement already cached: six==1.17.0 2025-09-07T10:21:29.8486875Z #43 0.670 DEBUG Registry requirement already cached: pycountry==24.6.1 2025-09-07T10:21:29.8487292Z #43 0.670 DEBUG Requirement already installed: filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T10:21:29.8487536Z #43 0.670 DEBUG Registry requirement already cached: rich-toolkit==0.15.1 2025-09-07T10:21:29.8487775Z #43 0.670 DEBUG Registry requirement already cached: soundfile==0.13.1 2025-09-07T10:21:29.8488409Z #43 0.670 DEBUG Requirement already installed: torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.8489001Z #43 0.670 DEBUG Identified uncached distribution: vllm @ file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-linux_x86_64.whl 2025-09-07T10:21:29.8489279Z #43 0.670 DEBUG Registry requirement already cached: python-multipart==0.0.20 2025-09-07T10:21:29.8489457Z #43 0.670 DEBUG Unnecessary package: build==1.3.0 2025-09-07T10:21:29.8489620Z #43 0.670 DEBUG Unnecessary package: opt-einsum==3.4.0 2025-09-07T10:21:29.8489771Z #43 0.670 DEBUG Preserving seed package: pip==25.2 2025-09-07T10:21:29.8489965Z #43 0.670 DEBUG Unnecessary package: pyproject-hooks==1.2.0 2025-09-07T10:21:29.8490610Z #43 0.670 DEBUG Unnecessary package: torchaudio==2.8.0.dev20250901+cu129 (from file:///dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.8491566Z #43 0.670 DEBUG Unnecessary package: torchvision==0.24.0.dev20250901+cu129 (from file:///dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:21:29.8491752Z #43 0.670 DEBUG Preserving seed package: uv==0.8.4 2025-09-07T10:21:29.8491898Z #43 0.670 DEBUG Unnecessary package: wheel==0.45.1 2025-09-07T10:21:29.8492860Z #43 0.671 DEBUG No cache entry for: https://files.pythonhosted.org/packages/e0/95/d7e1295141e7d530674a3cc567e13ed0eb6b81524cb122d797ed996b5bea/cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl 2025-09-07T10:21:29.8493774Z #43 0.671 DEBUG No cache entry for: https://files.pythonhosted.org/packages/00/02/c81260c0f94bd34a1442ea488bdd433dfc9e6ed6211c9a59bc4157b8e00e/ray-2.49.1-cp312-cp312-manylinux2014_x86_64.whl 2025-09-07T10:21:29.8494794Z #43 0.671 DEBUG No cache entry for: https://files.pythonhosted.org/packages/4d/ec/fd869e2567cc9c01278a736cfd1697941ba0d4b81a43e0aa2e8d71dab208/msgpack-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 2025-09-07T10:21:29.8495969Z #43 0.671 DEBUG No cache entry for: https://files.pythonhosted.org/packages/80/07/cdecb7aa976f34328372f1c4efd6c9dc1b039b3cc8d3f38787d640009a25/fastrlock-0.8.3-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:21:29.8496113Z #43 0.685 Downloading cupy-cuda12x (107.7MiB) 2025-09-07T10:21:29.8496233Z #43 0.692 Downloading ray (66.9MiB) 2025-09-07T10:21:31.2793165Z #43 2.292 Downloading cupy-cuda12x 2025-09-07T10:21:31.5006007Z #43 2.363 Downloading ray 2025-09-07T10:21:32.3697658Z #43 3.382 Prepared 5 packages in 2.71s 2025-09-07T10:21:32.5571940Z #43 3.419 DEBUG Uninstalled numpy (907 files, 90 directories) 2025-09-07T10:21:32.5572441Z #43 3.419 Uninstalled 1 package in 36ms 2025-09-07T10:21:33.2636183Z #43 4.276 Installed 112 packages in 856ms 2025-09-07T10:21:33.2636750Z #43 4.276 + aiohappyeyeballs==2.6.1 2025-09-07T10:21:33.2637200Z #43 4.276 + aiohttp==3.12.15 2025-09-07T10:21:33.4208815Z #43 4.276 + aiosignal==1.4.0 2025-09-07T10:21:33.4209745Z #43 4.276 + annotated-types==0.7.0 2025-09-07T10:21:33.4210394Z #43 4.276 + anyio==4.10.0 2025-09-07T10:21:33.4210903Z #43 4.276 + astor==0.8.1 2025-09-07T10:21:33.4211191Z #43 4.276 + attrs==25.3.0 2025-09-07T10:21:33.4211463Z #43 4.276 + blake3==1.0.5 2025-09-07T10:21:33.4211933Z #43 4.276 + cachetools==6.2.0 2025-09-07T10:21:33.4212233Z #43 4.276 + cbor2==5.7.0 2025-09-07T10:21:33.4212526Z #43 4.276 + certifi==2025.8.3 2025-09-07T10:21:33.4212829Z #43 4.276 + cffi==1.17.1 2025-09-07T10:21:33.4213142Z #43 4.276 + charset-normalizer==3.4.3 2025-09-07T10:21:33.4213476Z #43 4.276 + click==8.2.1 2025-09-07T10:21:33.4213768Z #43 4.276 + cloudpickle==3.1.1 2025-09-07T10:21:33.4214091Z #43 4.276 + compressed-tensors==0.11.0 2025-09-07T10:21:33.4214452Z #43 4.276 + cupy-cuda12x==13.6.0 2025-09-07T10:21:33.4214786Z #43 4.276 + depyf==0.19.0 2025-09-07T10:21:33.4215062Z #43 4.277 + dill==0.4.0 2025-09-07T10:21:33.4215351Z #43 4.277 + diskcache==5.6.3 2025-09-07T10:21:33.4215644Z #43 4.277 + distro==1.9.0 2025-09-07T10:21:33.4215946Z #43 4.277 + dnspython==2.7.0 2025-09-07T10:21:33.4216239Z #43 4.277 + einops==0.8.1 2025-09-07T10:21:33.4216544Z #43 4.277 + email-validator==2.3.0 2025-09-07T10:21:33.4216872Z #43 4.277 + fastapi==0.116.1 2025-09-07T10:21:33.4217189Z #43 4.277 + fastapi-cli==0.0.10 2025-09-07T10:21:33.4217516Z #43 4.277 + fastapi-cloud-cli==0.1.5 2025-09-07T10:21:33.4217865Z #43 4.277 + fastrlock==0.8.3 2025-09-07T10:21:33.4218279Z #43 4.277 + frozendict==2.4.6 2025-09-07T10:21:33.4218592Z #43 4.277 + frozenlist==1.7.0 2025-09-07T10:21:33.4218901Z #43 4.277 + gguf==0.17.1 2025-09-07T10:21:33.4219176Z #43 4.277 + h11==0.16.0 2025-09-07T10:21:33.4219463Z #43 4.277 + hf-xet==1.1.9 2025-09-07T10:21:33.4219749Z #43 4.277 + httpcore==1.0.9 2025-09-07T10:21:33.4220067Z #43 4.277 + httptools==0.6.4 2025-09-07T10:21:33.4220423Z #43 4.277 + httpx==0.28.1 2025-09-07T10:21:33.4220735Z #43 4.277 + huggingface-hub==0.34.4 2025-09-07T10:21:33.4221056Z #43 4.277 + idna==3.10 2025-09-07T10:21:33.4221346Z #43 4.277 + interegular==0.3.3 2025-09-07T10:21:33.4221659Z #43 4.277 + jiter==0.10.0 2025-09-07T10:21:33.4221943Z #43 4.277 + jsonschema==4.25.1 2025-09-07T10:21:33.4222306Z #43 4.277 + jsonschema-specifications==2025.4.1 2025-09-07T10:21:33.4222788Z #43 4.277 + lark==1.2.2 2025-09-07T10:21:33.4223070Z #43 4.277 + llguidance==0.7.30 2025-09-07T10:21:33.4223360Z #43 4.277 + llvmlite==0.44.0 2025-09-07T10:21:33.4223680Z #43 4.277 + lm-format-enforcer==0.11.3 2025-09-07T10:21:33.4224024Z #43 4.277 + markdown-it-py==4.0.0 2025-09-07T10:21:33.4224343Z #43 4.277 + mdurl==0.1.2 2025-09-07T10:21:33.4224619Z #43 4.277 + mistral-common==1.8.4 2025-09-07T10:21:33.4224939Z #43 4.277 + msgpack==1.1.1 2025-09-07T10:21:33.4225228Z #43 4.277 + msgspec==0.19.0 2025-09-07T10:21:33.4225573Z #43 4.277 + multidict==6.6.4 2025-09-07T10:21:33.4225876Z #43 4.277 + ninja==1.13.0 2025-09-07T10:21:33.4226143Z #43 4.278 + numba==0.61.2 2025-09-07T10:21:33.4226741Z #43 4.278 - numpy==2.3.2 (from file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:21:33.4227351Z #43 4.278 + numpy==2.2.6 2025-09-07T10:21:33.4227631Z #43 4.278 + openai==1.106.1 2025-09-07T10:21:33.4227931Z #43 4.278 + openai-harmony==0.0.4 2025-09-07T10:21:33.4228298Z #43 4.278 + opencv-python-headless==4.12.0.88 2025-09-07T10:21:33.4228680Z #43 4.278 + outlines-core==0.2.10 2025-09-07T10:21:33.4229033Z #43 4.278 + partial-json-parser==0.2.1.1.post6 2025-09-07T10:21:33.4229421Z #43 4.278 + prometheus-client==0.22.1 2025-09-07T10:21:33.4229816Z #43 4.278 + prometheus-fastapi-instrumentator==7.1.0 2025-09-07T10:21:33.4230221Z #43 4.278 + propcache==0.3.2 2025-09-07T10:21:33.4230510Z #43 4.278 + protobuf==6.32.0 2025-09-07T10:21:33.4230807Z #43 4.278 + psutil==7.0.0 2025-09-07T10:21:33.4231090Z #43 4.278 + py-cpuinfo==9.0.0 2025-09-07T10:21:33.4231391Z #43 4.278 + pybase64==1.4.2 2025-09-07T10:21:33.4231680Z #43 4.278 + pycountry==24.6.1 2025-09-07T10:21:33.4231983Z #43 4.278 + pycparser==2.22 2025-09-07T10:21:33.4232321Z #43 4.278 + pydantic==2.11.7 2025-09-07T10:21:33.4232622Z #43 4.278 + pydantic-core==2.33.2 2025-09-07T10:21:33.4232969Z #43 4.278 + pydantic-extra-types==2.10.5 2025-09-07T10:21:33.4233311Z #43 4.278 + pygments==2.19.2 2025-09-07T10:21:33.4233624Z #43 4.278 + python-dotenv==1.1.1 2025-09-07T10:21:33.4233954Z #43 4.278 + python-json-logger==3.3.0 2025-09-07T10:21:33.4234315Z #43 4.278 + python-multipart==0.0.20 2025-09-07T10:21:33.4234635Z #43 4.278 + pyyaml==6.0.2 2025-09-07T10:21:33.4234919Z #43 4.278 + pyzmq==27.0.2 2025-09-07T10:21:33.4235184Z #43 4.278 + ray==2.49.1 2025-09-07T10:21:33.4235470Z #43 4.278 + referencing==0.36.2 2025-09-07T10:21:33.4235774Z #43 4.278 + regex==2025.9.1 2025-09-07T10:21:33.4236054Z #43 4.278 + requests==2.32.5 2025-09-07T10:21:33.4236350Z #43 4.278 + rich==14.1.0 2025-09-07T10:21:33.4236624Z #43 4.279 + rich-toolkit==0.15.1 2025-09-07T10:21:33.4236937Z #43 4.279 + rignore==0.6.4 2025-09-07T10:21:33.4237214Z #43 4.279 + rpds-py==0.27.1 2025-09-07T10:21:33.4237517Z #43 4.279 + safetensors==0.6.2 2025-09-07T10:21:33.4237806Z #43 4.279 + scipy==1.16.1 2025-09-07T10:21:33.4238096Z #43 4.279 + sentencepiece==0.2.1 2025-09-07T10:21:33.4238402Z #43 4.279 + sentry-sdk==2.37.0 2025-09-07T10:21:33.4238713Z #43 4.279 + setproctitle==1.3.7 2025-09-07T10:21:33.4239031Z #43 4.279 + shellingham==1.5.4 2025-09-07T10:21:33.4239320Z #43 4.279 + six==1.17.0 2025-09-07T10:21:33.4239633Z #43 4.279 + sniffio==1.3.1 2025-09-07T10:21:33.4239912Z #43 4.279 + soundfile==0.13.1 2025-09-07T10:21:33.4240213Z #43 4.279 + soxr==0.5.0.post1 2025-09-07T10:21:33.4240500Z #43 4.279 + starlette==0.47.3 2025-09-07T10:21:33.4240796Z #43 4.279 + tiktoken==0.11.0 2025-09-07T10:21:33.4241084Z #43 4.279 + tokenizers==0.22.0 2025-09-07T10:21:33.4241385Z #43 4.279 + tqdm==4.67.1 2025-09-07T10:21:33.4241696Z #43 4.279 + transformers==4.56.1 2025-09-07T10:21:33.4242006Z #43 4.279 + triton==3.4.0 2025-09-07T10:21:33.4242288Z #43 4.279 + typer==0.17.4 2025-09-07T10:21:33.4242580Z #43 4.279 + typing-inspection==0.4.1 2025-09-07T10:21:33.4242913Z #43 4.279 + urllib3==2.5.0 2025-09-07T10:21:33.4243193Z #43 4.279 + uvicorn==0.35.0 2025-09-07T10:21:33.4243492Z #43 4.279 + uvloop==0.21.0 2025-09-07T10:21:33.4244304Z #43 4.279 + vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129 (from file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-linux_x86_64.whl) 2025-09-07T10:21:33.4245165Z #43 4.279 + watchfiles==1.1.0 2025-09-07T10:21:33.4245461Z #43 4.279 + websockets==15.0.1 2025-09-07T10:21:33.4245768Z #43 4.279 + xgrammar==0.1.23 2025-09-07T10:21:33.4246062Z #43 4.279 + yarl==1.20.1 2025-09-07T10:21:33.4246422Z #43 4.283 DEBUG Released lock at `/tmp/uv-281d6a3886c08524.lock` 2025-09-07T10:22:09.6106796Z #43 DONE 40.6s 2025-09-07T10:22:09.7637597Z 2025-09-07T10:22:09.7638538Z #44 [vllm-base 12/18] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system /wheels/xformers/*.whl --verbose 2025-09-07T10:22:10.1451279Z #44 0.532 DEBUG uv 0.8.4 2025-09-07T10:22:10.3205390Z #44 0.533 DEBUG Searching for default Python interpreter in managed installations or search path 2025-09-07T10:22:10.3208000Z #44 0.533 DEBUG Searching for managed installations at `/root/.local/share/uv/python` 2025-09-07T10:22:10.3208961Z #44 0.535 DEBUG Found `cpython-3.12.11-linux-x86_64-gnu` at `/opt/python/cp312-cp312/bin/python` (first executable in the search path) 2025-09-07T10:22:10.3209820Z #44 0.535 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T10:22:10.3210353Z #44 0.536 DEBUG Acquired lock for `/opt/python/cp312-cp312` 2025-09-07T10:22:10.3211559Z #44 0.541 DEBUG At least one requirement is not satisfied: file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T10:22:10.3212410Z #44 0.541 DEBUG Using request timeout of 500s 2025-09-07T10:22:10.3212874Z #44 0.547 DEBUG Solving with installed Python version: 3.12.11 2025-09-07T10:22:10.3213387Z #44 0.547 DEBUG Solving with target Python version: >=3.12.11 2025-09-07T10:22:10.3214017Z #44 0.547 DEBUG Adding direct dependency: xformers* 2025-09-07T10:22:10.3215030Z #44 0.547 DEBUG Searching for a compatible version of xformers @ file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl (*) 2025-09-07T10:22:10.3216123Z #44 0.547 DEBUG Adding transitive dependency for xformers==0.0.33+5d4b92a5.d20250907: torch>=2.8 2025-09-07T10:22:10.3216936Z #44 0.547 DEBUG Adding transitive dependency for xformers==0.0.33+5d4b92a5.d20250907: numpy* 2025-09-07T10:22:10.3217648Z #44 0.548 DEBUG Found fresh response for: https://pypi.org/simple/torch/ 2025-09-07T10:22:10.3218244Z #44 0.548 DEBUG Searching for a compatible version of torch (>=2.8) 2025-09-07T10:22:10.3219315Z #44 0.548 DEBUG Found installed version of torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.8 2025-09-07T10:22:10.3220903Z #44 0.548 DEBUG Found installed version of torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.8 2025-09-07T10:22:10.3222015Z #44 0.548 DEBUG Selecting: torch==2.9.0.dev20250901+cu129 [installed] (installed) 2025-09-07T10:22:10.3222715Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: filelock* 2025-09-07T10:22:10.3223727Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: typing-extensions>=4.10.0 2025-09-07T10:22:10.3224669Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: setuptools{python_full_version >= '3.12'}* 2025-09-07T10:22:10.3225571Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: sympy>=1.13.3 2025-09-07T10:22:10.3226360Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: networkx>=2.5.1 2025-09-07T10:22:10.3227176Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: jinja2* 2025-09-07T10:22:10.3227933Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: fsspec>=0.8.5 2025-09-07T10:22:10.3229077Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.86, <12.9.86+ 2025-09-07T10:22:10.3230601Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T10:22:10.3232114Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T10:22:10.3233582Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=9.10.2.21, <9.10.2.21+ 2025-09-07T10:22:10.3235153Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.1.4, <12.9.1.4+ 2025-09-07T10:22:10.3236616Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.4.1.4, <11.4.1.4+ 2025-09-07T10:22:10.3238081Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=10.3.10.19, <10.3.10.19+ 2025-09-07T10:22:10.3239581Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.7.5.82, <11.7.5.82+ 2025-09-07T10:22:10.3241101Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.5.10.65, <12.5.10.65+ 2025-09-07T10:22:10.3242626Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=0.7.1, <0.7.1+ 2025-09-07T10:22:10.3244080Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=2.27.5, <2.27.5+ 2025-09-07T10:22:10.3245512Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=3.3.20, <3.3.20+ 2025-09-07T10:22:10.3246938Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T10:22:10.3248402Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.86, <12.9.86+ 2025-09-07T10:22:10.3250327Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=1.14.1.1, <1.14.1.1+ 2025-09-07T10:22:10.3251751Z #44 0.549 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: pytorch-triton{sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T10:22:10.3252757Z #44 0.551 DEBUG Found fresh response for: https://pypi.org/simple/filelock/ 2025-09-07T10:22:10.3253449Z #44 0.551 DEBUG Found fresh response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T10:22:10.3254506Z #44 0.551 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T10:22:10.3255723Z #44 0.551 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T10:22:10.3256642Z #44 0.551 DEBUG Found fresh response for: https://pypi.org/simple/setuptools/ 2025-09-07T10:22:10.3257294Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/sympy/ 2025-09-07T10:22:10.3257913Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/networkx/ 2025-09-07T10:22:10.3258548Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/jinja2/ 2025-09-07T10:22:10.3259157Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/fsspec/ 2025-09-07T10:22:10.3259865Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T10:22:10.3260651Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T10:22:10.3261420Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T10:22:10.3262324Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T10:22:10.3263027Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T10:22:10.3264035Z #44 0.552 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.86, <12.9.86+) 2025-09-07T10:22:10.3265580Z #44 0.552 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.9.86, <12.9.86+ 2025-09-07T10:22:10.3266819Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T10:22:10.3267523Z #44 0.552 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T10:22:10.3268211Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T10:22:10.3269067Z #44 0.552 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.9.86: nvidia-cuda-nvrtc-cu12==12.9.86 2025-09-07T10:22:10.3269931Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T10:22:10.3271075Z #44 0.552 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.9.86: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.86 2025-09-07T10:22:10.3272170Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T10:22:10.3272891Z #44 0.552 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12 (==12.9.86) 2025-09-07T10:22:10.3274121Z #44 0.552 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:22:10.3275316Z #44 0.552 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T10:22:10.3276026Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T10:22:10.3276760Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T10:22:10.3277475Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T10:22:10.3278173Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T10:22:10.3278889Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T10:22:10.3279603Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T10:22:10.3280330Z #44 0.552 DEBUG Found fresh response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T10:22:10.3281194Z #44 0.552 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T10:22:10.3282258Z #44 0.552 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T10:22:10.3283322Z #44 0.552 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T10:22:10.3284359Z #44 0.552 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T10:22:10.3285774Z #44 0.553 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:22:10.3286934Z #44 0.553 DEBUG Found fresh response for: https://pypi.org/simple/numpy/ 2025-09-07T10:22:10.3287526Z #44 0.553 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T10:22:10.3288440Z #44 0.553 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.86) 2025-09-07T10:22:10.3289939Z #44 0.553 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:22:10.3291406Z #44 0.553 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T10:22:10.3292438Z #44 0.553 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T10:22:10.3294053Z #44 0.553 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T10:22:10.3295371Z #44 0.553 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T10:22:10.3296272Z #44 0.553 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.9.79: nvidia-cuda-runtime-cu12==12.9.79 2025-09-07T10:22:10.3297580Z #44 0.553 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.9.79: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T10:22:10.3298786Z #44 0.553 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12 (==12.9.79) 2025-09-07T10:22:10.3300078Z #44 0.553 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:22:10.3301356Z #44 0.553 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T10:22:10.3302629Z #44 0.553 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:22:10.3304238Z #44 0.553 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T10:22:10.3305751Z #44 0.553 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:22:10.3306978Z #44 0.553 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T10:22:10.3307965Z #44 0.553 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T10:22:10.3309445Z #44 0.553 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T10:22:10.3310590Z #44 0.553 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T10:22:10.3311405Z #44 0.553 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.9.79: nvidia-cuda-cupti-cu12==12.9.79 2025-09-07T10:22:10.3312661Z #44 0.553 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.9.79: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T10:22:10.3313741Z #44 0.553 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12 (==12.9.79) 2025-09-07T10:22:10.3314873Z #44 0.553 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:22:10.3315973Z #44 0.553 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T10:22:10.3317061Z #44 0.553 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:22:10.3318435Z #44 0.553 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T10:22:10.3319830Z #44 0.553 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:22:10.3320927Z #44 0.553 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T10:22:10.3321898Z #44 0.553 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=9.10.2.21, <9.10.2.21+) 2025-09-07T10:22:10.3323286Z #44 0.553 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=9.10.2.21, <9.10.2.21+ 2025-09-07T10:22:10.3324382Z #44 0.553 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T10:22:10.3325159Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12==9.10.2.21 2025-09-07T10:22:10.3326287Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==9.10.2.21 2025-09-07T10:22:10.3327344Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cudnn-cu12 (==9.10.2.21) 2025-09-07T10:22:10.3328411Z #44 0.554 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T10:22:10.3329868Z #44 0.554 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T10:22:10.3331164Z #44 0.554 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T10:22:10.3331929Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T10:22:10.3332981Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==9.10.2.21) 2025-09-07T10:22:10.3334304Z #44 0.554 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies * 2025-09-07T10:22:10.3335751Z #44 0.554 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T10:22:10.3336889Z #44 0.554 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T10:22:10.3337648Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T10:22:10.3338735Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.1.4, <12.9.1.4+) 2025-09-07T10:22:10.3340196Z #44 0.554 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=12.9.1.4, <12.9.1.4+ 2025-09-07T10:22:10.3341364Z #44 0.554 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T10:22:10.3342300Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.9.1.4: nvidia-cublas-cu12==12.9.1.4 2025-09-07T10:22:10.3343451Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.9.1.4: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.1.4 2025-09-07T10:22:10.3344481Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cublas-cu12 (==12.9.1.4) 2025-09-07T10:22:10.3345563Z #44 0.554 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T10:22:10.3347106Z #44 0.554 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T10:22:10.3348359Z #44 0.554 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T10:22:10.3349637Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.1.4) 2025-09-07T10:22:10.3350993Z #44 0.554 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T10:22:10.3352084Z #44 0.554 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T10:22:10.3353055Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.4.1.4, <11.4.1.4+) 2025-09-07T10:22:10.3354581Z #44 0.554 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=11.4.1.4, <11.4.1.4+ 2025-09-07T10:22:10.3355799Z #44 0.554 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T10:22:10.3356652Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-cufft-cu12==11.4.1.4 2025-09-07T10:22:10.3357817Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.4.1.4 2025-09-07T10:22:10.3358871Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cufft-cu12 (==11.4.1.4) 2025-09-07T10:22:10.3360064Z #44 0.554 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T10:22:10.3361847Z #44 0.554 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T10:22:10.3362985Z #44 0.554 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T10:22:10.3363728Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-nvjitlink-cu12* 2025-09-07T10:22:10.3364748Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.4.1.4) 2025-09-07T10:22:10.3366197Z #44 0.554 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies * 2025-09-07T10:22:10.3367832Z #44 0.554 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T10:22:10.3368963Z #44 0.554 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T10:22:10.3369741Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-nvjitlink-cu12* 2025-09-07T10:22:10.3371064Z #44 0.554 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=10.3.10.19, <10.3.10.19+) 2025-09-07T10:22:10.3372543Z #44 0.554 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=10.3.10.19, <10.3.10.19+ 2025-09-07T10:22:10.3373704Z #44 0.554 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T10:22:10.3374535Z #44 0.554 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.10.19: nvidia-curand-cu12==10.3.10.19 2025-09-07T10:22:10.3375739Z #44 0.554 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.10.19: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==10.3.10.19 2025-09-07T10:22:10.3376886Z #44 0.554 DEBUG Searching for a compatible version of nvidia-curand-cu12 (==10.3.10.19) 2025-09-07T10:22:10.3378033Z #44 0.554 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T10:22:10.3379134Z #44 0.554 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T10:22:10.3380256Z #44 0.554 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T10:22:10.3381628Z #44 0.554 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==10.3.10.19) 2025-09-07T10:22:10.3383013Z #44 0.554 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T10:22:10.3384202Z #44 0.554 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T10:22:10.3385206Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.7.5.82, <11.7.5.82+) 2025-09-07T10:22:10.3386664Z #44 0.554 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=11.7.5.82, <11.7.5.82+ 2025-09-07T10:22:10.3387811Z #44 0.554 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T10:22:10.3388628Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusolver-cu12==11.7.5.82 2025-09-07T10:22:10.3389839Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.7.5.82 2025-09-07T10:22:10.3390904Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cusolver-cu12 (==11.7.5.82) 2025-09-07T10:22:10.3392041Z #44 0.554 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T10:22:10.3393139Z #44 0.554 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T10:22:10.3394372Z #44 0.554 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T10:22:10.3395538Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cublas-cu12* 2025-09-07T10:22:10.3396402Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-nvjitlink-cu12* 2025-09-07T10:22:10.3397429Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusparse-cu12* 2025-09-07T10:22:10.3398515Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.7.5.82) 2025-09-07T10:22:10.3399945Z #44 0.554 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies * 2025-09-07T10:22:10.3401543Z #44 0.554 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T10:22:10.3402650Z #44 0.554 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T10:22:10.3403420Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cublas-cu12* 2025-09-07T10:22:10.3404307Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-nvjitlink-cu12* 2025-09-07T10:22:10.3405241Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusparse-cu12* 2025-09-07T10:22:10.3406342Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.5.10.65, <12.5.10.65+) 2025-09-07T10:22:10.3407917Z #44 0.554 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.5.10.65, <12.5.10.65+ 2025-09-07T10:22:10.3409185Z #44 0.554 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T10:22:10.3410037Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-cusparse-cu12==12.5.10.65 2025-09-07T10:22:10.3411540Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.5.10.65 2025-09-07T10:22:10.3412658Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cusparse-cu12 (==12.5.10.65) 2025-09-07T10:22:10.3413993Z #44 0.554 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T10:22:10.3415794Z #44 0.554 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T10:22:10.3417030Z #44 0.554 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T10:22:10.3417869Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-nvjitlink-cu12* 2025-09-07T10:22:10.3418959Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.5.10.65) 2025-09-07T10:22:10.3420501Z #44 0.554 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T10:22:10.3421751Z #44 0.554 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T10:22:10.3422566Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-nvjitlink-cu12* 2025-09-07T10:22:10.3423823Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=0.7.1, <0.7.1+) 2025-09-07T10:22:10.3425252Z #44 0.554 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies >=0.7.1, <0.7.1+ 2025-09-07T10:22:10.3426369Z #44 0.554 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T10:22:10.3427324Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12==0.7.1 2025-09-07T10:22:10.3428481Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==0.7.1 2025-09-07T10:22:10.3429524Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12 (==0.7.1) 2025-09-07T10:22:10.3430608Z #44 0.554 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T10:22:10.3432045Z #44 0.554 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T10:22:10.3433277Z #44 0.554 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T10:22:10.3434242Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==0.7.1) 2025-09-07T10:22:10.3435599Z #44 0.554 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T10:22:10.3436678Z #44 0.554 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T10:22:10.3437609Z #44 0.554 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=2.27.5, <2.27.5+) 2025-09-07T10:22:10.3439038Z #44 0.554 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=2.27.5, <2.27.5+ 2025-09-07T10:22:10.3440177Z #44 0.554 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T10:22:10.3440895Z #44 0.554 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12==2.27.5 2025-09-07T10:22:10.3442006Z #44 0.554 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==2.27.5 2025-09-07T10:22:10.3442979Z #44 0.554 DEBUG Searching for a compatible version of nvidia-nccl-cu12 (==2.27.5) 2025-09-07T10:22:10.3444101Z #44 0.554 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T10:22:10.3445675Z #44 0.554 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T10:22:10.3446756Z #44 0.554 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T10:22:10.3447643Z #44 0.554 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==2.27.5) 2025-09-07T10:22:10.3449233Z #44 0.554 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T10:22:10.3450614Z #44 0.554 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T10:22:10.3451591Z #44 0.554 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=3.3.20, <3.3.20+) 2025-09-07T10:22:10.3453210Z #44 0.554 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=3.3.20, <3.3.20+ 2025-09-07T10:22:10.3454430Z #44 0.554 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T10:22:10.3455240Z #44 0.554 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12==3.3.20 2025-09-07T10:22:10.3456466Z #44 0.554 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.3.20 2025-09-07T10:22:10.3457548Z #44 0.554 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12 (==3.3.20) 2025-09-07T10:22:10.3458763Z #44 0.554 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T10:22:10.3459933Z #44 0.554 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T10:22:10.3461116Z #44 0.554 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T10:22:10.3462663Z #44 0.555 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.3.20) 2025-09-07T10:22:10.3464130Z #44 0.555 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T10:22:10.3465265Z #44 0.555 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T10:22:10.3466184Z #44 0.555 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T10:22:10.3467612Z #44 0.555 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T10:22:10.3468738Z #44 0.555 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T10:22:10.3469566Z #44 0.555 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.9.79: nvidia-nvtx-cu12==12.9.79 2025-09-07T10:22:10.3470631Z #44 0.555 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.9.79: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T10:22:10.3471631Z #44 0.555 DEBUG Searching for a compatible version of nvidia-nvtx-cu12 (==12.9.79) 2025-09-07T10:22:10.3472884Z #44 0.555 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:22:10.3474430Z #44 0.555 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:22:10.3475501Z #44 0.555 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T10:22:10.3476399Z #44 0.555 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T10:22:10.3477753Z #44 0.555 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:22:10.3478827Z #44 0.555 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T10:22:10.3479775Z #44 0.555 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.86, <12.9.86+) 2025-09-07T10:22:10.3481325Z #44 0.555 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.9.86, <12.9.86+ 2025-09-07T10:22:10.3482552Z #44 0.555 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T10:22:10.3483372Z #44 0.555 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.9.86: nvidia-nvjitlink-cu12==12.9.86 2025-09-07T10:22:10.3484597Z #44 0.555 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.9.86: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.86 2025-09-07T10:22:10.3485677Z #44 0.555 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12 (==12.9.86) 2025-09-07T10:22:10.3486901Z #44 0.555 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:22:10.3488594Z #44 0.555 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:22:10.3489780Z #44 0.555 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T10:22:10.3490784Z #44 0.555 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.86) 2025-09-07T10:22:10.3492514Z #44 0.555 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:22:10.3493743Z #44 0.555 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T10:22:10.3494734Z #44 0.555 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=1.14.1.1, <1.14.1.1+) 2025-09-07T10:22:10.3496278Z #44 0.555 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=1.14.1.1, <1.14.1.1+ 2025-09-07T10:22:10.3497509Z #44 0.555 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T10:22:10.3498306Z #44 0.555 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.14.1.1: nvidia-cufile-cu12==1.14.1.1 2025-09-07T10:22:10.3499493Z #44 0.555 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.14.1.1: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==1.14.1.1 2025-09-07T10:22:10.3500707Z #44 0.555 DEBUG Searching for a compatible version of nvidia-cufile-cu12 (==1.14.1.1) 2025-09-07T10:22:10.3501929Z #44 0.555 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T10:22:10.3503808Z #44 0.555 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T10:22:10.3504910Z #44 0.555 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T10:22:10.3505804Z #44 0.555 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==1.14.1.1) 2025-09-07T10:22:10.3507193Z #44 0.555 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T10:22:10.3508299Z #44 0.555 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T10:22:10.3509273Z #44 0.555 DEBUG Searching for a compatible version of pytorch-triton{sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T10:22:10.3510770Z #44 0.555 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:22:10.3512346Z #44 0.555 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T10:22:10.3513197Z #44 0.555 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton==3.4.0+gitf7888497 2025-09-07T10:22:10.3514325Z #44 0.555 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton{sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T10:22:10.3515319Z #44 0.555 DEBUG Searching for a compatible version of pytorch-triton (==3.4.0+gitf7888497) 2025-09-07T10:22:10.3516660Z #44 0.555 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:22:10.3517945Z #44 0.555 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T10:22:10.3519257Z #44 0.555 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:22:10.3520676Z #44 0.555 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T10:22:10.3521576Z #44 0.555 DEBUG Searching for a compatible version of pytorch-triton{sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T10:22:10.3523016Z #44 0.555 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:22:10.3524544Z #44 0.555 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies >=40.8.0 2025-09-07T10:22:10.3525489Z #44 0.555 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T10:22:10.3536980Z #44 0.555 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T10:22:10.3537834Z #44 0.555 DEBUG Searching for a compatible version of numpy (*) 2025-09-07T10:22:10.3538422Z #44 0.555 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T10:22:10.3539012Z #44 0.555 DEBUG Selecting: numpy==2.2.6 [installed] (installed) 2025-09-07T10:22:10.3539547Z #44 0.555 DEBUG Searching for a compatible version of filelock (*) 2025-09-07T10:22:10.3540483Z #44 0.555 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T10:22:10.3541306Z #44 0.555 DEBUG Selecting: filelock==3.19.1 [installed] (installed) 2025-09-07T10:22:10.3541941Z #44 0.555 DEBUG Searching for a compatible version of typing-extensions (>=4.10.0) 2025-09-07T10:22:10.3543081Z #44 0.555 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T10:22:10.3544038Z #44 0.555 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T10:22:10.3544774Z #44 0.555 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (*) 2025-09-07T10:22:10.3545737Z #44 0.555 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies * 2025-09-07T10:22:10.3546575Z #44 0.555 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T10:22:10.3547221Z #44 0.555 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools==78.1.0 2025-09-07T10:22:10.3548070Z #44 0.555 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools{python_full_version >= '3.12'}==78.1.0 2025-09-07T10:22:10.3549323Z #44 0.555 DEBUG Searching for a compatible version of setuptools (==78.1.0) 2025-09-07T10:22:10.3550246Z #44 0.555 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T10:22:10.3551145Z #44 0.555 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T10:22:10.3552040Z #44 0.555 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T10:22:10.3553136Z #44 0.555 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (==78.1.0) 2025-09-07T10:22:10.3554191Z #44 0.555 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T10:22:10.3555068Z #44 0.555 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T10:22:10.3555658Z #44 0.555 DEBUG Searching for a compatible version of sympy (>=1.13.3) 2025-09-07T10:22:10.3556499Z #44 0.555 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T10:22:10.3557289Z #44 0.555 DEBUG Selecting: sympy==1.14.0 [installed] (installed) 2025-09-07T10:22:10.3557910Z #44 0.555 DEBUG Adding transitive dependency for sympy==1.14.0: mpmath>=1.1.0, <1.4 2025-09-07T10:22:10.3558564Z #44 0.555 DEBUG Searching for a compatible version of networkx (>=2.5.1) 2025-09-07T10:22:10.3559471Z #44 0.555 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T10:22:10.3560291Z #44 0.555 DEBUG Selecting: networkx==3.5 [installed] (installed) 2025-09-07T10:22:10.3560818Z #44 0.555 DEBUG Searching for a compatible version of jinja2 (*) 2025-09-07T10:22:10.3561708Z #44 0.555 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T10:22:10.3562451Z #44 0.555 DEBUG Selecting: jinja2==3.1.6 [installed] (installed) 2025-09-07T10:22:10.3563035Z #44 0.555 DEBUG Adding transitive dependency for jinja2==3.1.6: markupsafe>=2.0 2025-09-07T10:22:10.3563645Z #44 0.555 DEBUG Searching for a compatible version of fsspec (>=0.8.5) 2025-09-07T10:22:10.3564484Z #44 0.555 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T10:22:10.3565309Z #44 0.555 DEBUG Selecting: fsspec==2025.7.0 [installed] (installed) 2025-09-07T10:22:10.3565885Z #44 0.556 DEBUG Found fresh response for: https://pypi.org/simple/mpmath/ 2025-09-07T10:22:10.3566792Z #44 0.556 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T10:22:10.3567635Z #44 0.556 DEBUG Searching for a compatible version of mpmath (>=1.1.0, <1.4) 2025-09-07T10:22:10.3568492Z #44 0.556 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T10:22:10.3569297Z #44 0.556 DEBUG Selecting: mpmath==1.3.0 [installed] (installed) 2025-09-07T10:22:10.3569872Z #44 0.556 DEBUG Found fresh response for: https://pypi.org/simple/markupsafe/ 2025-09-07T10:22:10.3571205Z #44 0.556 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T10:22:10.3572278Z #44 0.556 DEBUG Searching for a compatible version of markupsafe (>=2.0) 2025-09-07T10:22:10.3573359Z #44 0.556 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T10:22:10.3574415Z #44 0.556 DEBUG Selecting: markupsafe==3.0.2 [installed] (installed) 2025-09-07T10:22:10.3577148Z #44 0.556 DEBUG Tried 28 versions: filelock 1, fsspec 1, jinja2 1, markupsafe 1, mpmath 1, networkx 1, numpy 1, nvidia-cublas-cu12 1, nvidia-cuda-cupti-cu12 1, nvidia-cuda-nvrtc-cu12 1, nvidia-cuda-runtime-cu12 1, nvidia-cudnn-cu12 1, nvidia-cufft-cu12 1, nvidia-cufile-cu12 1, nvidia-curand-cu12 1, nvidia-cusolver-cu12 1, nvidia-cusparse-cu12 1, nvidia-cusparselt-cu12 1, nvidia-nccl-cu12 1, nvidia-nvjitlink-cu12 1, nvidia-nvshmem-cu12 1, nvidia-nvtx-cu12 1, pytorch-triton 1, setuptools 1, sympy 1, torch 1, typing-extensions 1, xformers 1 2025-09-07T10:22:10.3579794Z #44 0.556 DEBUG marker environment resolution took 0.009s 2025-09-07T10:22:10.3580274Z #44 0.557 Resolved 28 packages in 12ms 2025-09-07T10:22:10.3581252Z #44 0.557 DEBUG Requirement already installed: nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:22:10.3582869Z #44 0.557 DEBUG Requirement already installed: nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T10:22:10.3584220Z #44 0.557 DEBUG Requirement already installed: nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:22:10.3585513Z #44 0.557 DEBUG Identified uncached distribution: xformers @ file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T10:22:10.3586892Z #44 0.557 DEBUG Requirement already installed: nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:22:10.3588393Z #44 0.557 DEBUG Requirement already installed: nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:22:10.3589893Z #44 0.557 DEBUG Requirement already installed: nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:22:10.3590999Z #44 0.557 DEBUG Requirement already installed: sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T10:22:10.3592104Z #44 0.557 DEBUG Requirement already installed: nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) 2025-09-07T10:22:10.3593416Z #44 0.557 DEBUG Requirement already installed: nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:22:10.3594812Z #44 0.557 DEBUG Requirement already installed: nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T10:22:10.3596210Z #44 0.557 DEBUG Requirement already installed: nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:22:10.3597284Z #44 0.557 DEBUG Requirement already installed: jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T10:22:10.3598734Z #44 0.557 DEBUG Requirement already installed: pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:22:10.3600280Z #44 0.557 DEBUG Requirement already installed: nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:22:10.3601487Z #44 0.557 DEBUG Requirement already installed: setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T10:22:10.3602682Z #44 0.557 DEBUG Requirement already installed: markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:22:10.3603831Z #44 0.557 DEBUG Requirement already installed: filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T10:22:10.3604563Z #44 0.557 DEBUG Requirement already installed: numpy==2.2.6 2025-09-07T10:22:10.3605632Z #44 0.557 DEBUG Requirement already installed: nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:22:10.3607124Z #44 0.557 DEBUG Requirement already installed: torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:22:10.3608562Z #44 0.557 DEBUG Requirement already installed: nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:22:10.3609858Z #44 0.557 DEBUG Requirement already installed: networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T10:22:10.3611113Z #44 0.557 DEBUG Requirement already installed: typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T10:22:10.3612183Z #44 0.557 DEBUG Requirement already installed: fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T10:22:10.3613390Z #44 0.557 DEBUG Requirement already installed: nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) 2025-09-07T10:22:10.3614580Z #44 0.557 DEBUG Requirement already installed: mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T10:22:10.3615841Z #44 0.557 DEBUG Requirement already installed: nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T10:22:10.3616907Z #44 0.557 DEBUG Unnecessary package: pyyaml==6.0.2 2025-09-07T10:22:10.3617395Z #44 0.557 DEBUG Unnecessary package: aiohappyeyeballs==2.6.1 2025-09-07T10:22:10.3617878Z #44 0.557 DEBUG Unnecessary package: aiohttp==3.12.15 2025-09-07T10:22:10.3618333Z #44 0.557 DEBUG Unnecessary package: aiosignal==1.4.0 2025-09-07T10:22:10.3618805Z #44 0.557 DEBUG Unnecessary package: annotated-types==0.7.0 2025-09-07T10:22:10.3619280Z #44 0.557 DEBUG Unnecessary package: anyio==4.10.0 2025-09-07T10:22:10.3619700Z #44 0.557 DEBUG Unnecessary package: astor==0.8.1 2025-09-07T10:22:10.3620138Z #44 0.557 DEBUG Unnecessary package: attrs==25.3.0 2025-09-07T10:22:10.3620559Z #44 0.557 DEBUG Unnecessary package: blake3==1.0.5 2025-09-07T10:22:10.3621036Z #44 0.557 DEBUG Unnecessary package: build==1.3.0 2025-09-07T10:22:10.3621489Z #44 0.557 DEBUG Unnecessary package: cachetools==6.2.0 2025-09-07T10:22:10.3621924Z #44 0.557 DEBUG Unnecessary package: cbor2==5.7.0 2025-09-07T10:22:10.3622373Z #44 0.557 DEBUG Unnecessary package: certifi==2025.8.3 2025-09-07T10:22:10.3622925Z #44 0.557 DEBUG Unnecessary package: cffi==1.17.1 2025-09-07T10:22:10.3623421Z #44 0.557 DEBUG Unnecessary package: charset-normalizer==3.4.3 2025-09-07T10:22:10.3623995Z #44 0.557 DEBUG Unnecessary package: click==8.2.1 2025-09-07T10:22:10.3624576Z #44 0.557 DEBUG Unnecessary package: cloudpickle==3.1.1 2025-09-07T10:22:10.3625078Z #44 0.557 DEBUG Unnecessary package: compressed-tensors==0.11.0 2025-09-07T10:22:10.3625568Z #44 0.557 DEBUG Unnecessary package: cupy-cuda12x==13.6.0 2025-09-07T10:22:10.3626018Z #44 0.557 DEBUG Unnecessary package: depyf==0.19.0 2025-09-07T10:22:10.3626436Z #44 0.557 DEBUG Unnecessary package: dill==0.4.0 2025-09-07T10:22:10.3626848Z #44 0.557 DEBUG Unnecessary package: diskcache==5.6.3 2025-09-07T10:22:10.3627276Z #44 0.557 DEBUG Unnecessary package: distro==1.9.0 2025-09-07T10:22:10.3627695Z #44 0.557 DEBUG Unnecessary package: dnspython==2.7.0 2025-09-07T10:22:10.3628129Z #44 0.557 DEBUG Unnecessary package: einops==0.8.1 2025-09-07T10:22:10.3628570Z #44 0.557 DEBUG Unnecessary package: email-validator==2.3.0 2025-09-07T10:22:10.3629035Z #44 0.557 DEBUG Unnecessary package: fastapi==0.116.1 2025-09-07T10:22:10.3629477Z #44 0.557 DEBUG Unnecessary package: fastapi-cli==0.0.10 2025-09-07T10:22:10.3629962Z #44 0.557 DEBUG Unnecessary package: fastapi-cloud-cli==0.1.5 2025-09-07T10:22:10.3630435Z #44 0.557 DEBUG Unnecessary package: fastrlock==0.8.3 2025-09-07T10:22:10.3630869Z #44 0.557 DEBUG Unnecessary package: frozendict==2.4.6 2025-09-07T10:22:10.3631348Z #44 0.557 DEBUG Unnecessary package: frozenlist==1.7.0 2025-09-07T10:22:10.3631768Z #44 0.557 DEBUG Unnecessary package: gguf==0.17.1 2025-09-07T10:22:10.3632444Z #44 0.557 DEBUG Unnecessary package: h11==0.16.0 2025-09-07T10:22:10.3632846Z #44 0.557 DEBUG Unnecessary package: hf-xet==1.1.9 2025-09-07T10:22:10.3633280Z #44 0.557 DEBUG Unnecessary package: httpcore==1.0.9 2025-09-07T10:22:10.3633727Z #44 0.557 DEBUG Unnecessary package: httptools==0.6.4 2025-09-07T10:22:10.3634195Z #44 0.557 DEBUG Unnecessary package: httpx==0.28.1 2025-09-07T10:22:10.3634658Z #44 0.557 DEBUG Unnecessary package: huggingface-hub==0.34.4 2025-09-07T10:22:10.3635103Z #44 0.557 DEBUG Unnecessary package: idna==3.10 2025-09-07T10:22:10.3635537Z #44 0.557 DEBUG Unnecessary package: interegular==0.3.3 2025-09-07T10:22:10.3635963Z #44 0.557 DEBUG Unnecessary package: jiter==0.10.0 2025-09-07T10:22:10.3636405Z #44 0.557 DEBUG Unnecessary package: jsonschema==4.25.1 2025-09-07T10:22:10.3636927Z #44 0.557 DEBUG Unnecessary package: jsonschema-specifications==2025.4.1 2025-09-07T10:22:10.3637453Z #44 0.557 DEBUG Unnecessary package: lark==1.2.2 2025-09-07T10:22:10.3637891Z #44 0.557 DEBUG Unnecessary package: llguidance==0.7.30 2025-09-07T10:22:10.3638325Z #44 0.557 DEBUG Unnecessary package: llvmlite==0.44.0 2025-09-07T10:22:10.3638808Z #44 0.557 DEBUG Unnecessary package: lm-format-enforcer==0.11.3 2025-09-07T10:22:10.3639335Z #44 0.557 DEBUG Unnecessary package: markdown-it-py==4.0.0 2025-09-07T10:22:10.3639791Z #44 0.557 DEBUG Unnecessary package: mdurl==0.1.2 2025-09-07T10:22:10.3640226Z #44 0.557 DEBUG Unnecessary package: mistral-common==1.8.4 2025-09-07T10:22:10.3640688Z #44 0.557 DEBUG Unnecessary package: msgpack==1.1.1 2025-09-07T10:22:10.3641127Z #44 0.557 DEBUG Unnecessary package: msgspec==0.19.0 2025-09-07T10:22:10.3641556Z #44 0.557 DEBUG Unnecessary package: multidict==6.6.4 2025-09-07T10:22:10.3641993Z #44 0.557 DEBUG Unnecessary package: ninja==1.13.0 2025-09-07T10:22:10.3642403Z #44 0.557 DEBUG Unnecessary package: numba==0.61.2 2025-09-07T10:22:10.3642837Z #44 0.557 DEBUG Unnecessary package: openai==1.106.1 2025-09-07T10:22:10.3643289Z #44 0.557 DEBUG Unnecessary package: openai-harmony==0.0.4 2025-09-07T10:22:10.3643832Z #44 0.557 DEBUG Unnecessary package: opencv-python-headless==4.12.0.88 2025-09-07T10:22:10.3644348Z #44 0.557 DEBUG Unnecessary package: opt-einsum==3.4.0 2025-09-07T10:22:10.3644817Z #44 0.557 DEBUG Unnecessary package: outlines-core==0.2.10 2025-09-07T10:22:10.3645288Z #44 0.557 DEBUG Unnecessary package: packaging==25.0 2025-09-07T10:22:10.3645797Z #44 0.557 DEBUG Unnecessary package: partial-json-parser==0.2.1.1.post6 2025-09-07T10:22:10.3646738Z #44 0.557 DEBUG Unnecessary package: pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:22:10.3647542Z #44 0.557 DEBUG Preserving seed package: pip==25.2 2025-09-07T10:22:10.3648026Z #44 0.557 DEBUG Unnecessary package: prometheus-client==0.22.1 2025-09-07T10:22:10.3648618Z #44 0.557 DEBUG Unnecessary package: prometheus-fastapi-instrumentator==7.1.0 2025-09-07T10:22:10.3649418Z #44 0.557 DEBUG Unnecessary package: propcache==0.3.2 2025-09-07T10:22:10.3650045Z #44 0.557 DEBUG Unnecessary package: protobuf==6.32.0 2025-09-07T10:22:10.3650550Z #44 0.557 DEBUG Unnecessary package: psutil==7.0.0 2025-09-07T10:22:10.3651013Z #44 0.557 DEBUG Unnecessary package: py-cpuinfo==9.0.0 2025-09-07T10:22:10.3651469Z #44 0.557 DEBUG Unnecessary package: pybase64==1.4.2 2025-09-07T10:22:10.3651929Z #44 0.557 DEBUG Unnecessary package: pycountry==24.6.1 2025-09-07T10:22:10.3652378Z #44 0.557 DEBUG Unnecessary package: pycparser==2.22 2025-09-07T10:22:10.3652837Z #44 0.557 DEBUG Unnecessary package: pydantic==2.11.7 2025-09-07T10:22:10.3653312Z #44 0.557 DEBUG Unnecessary package: pydantic-core==2.33.2 2025-09-07T10:22:10.3653831Z #44 0.557 DEBUG Unnecessary package: pydantic-extra-types==2.10.5 2025-09-07T10:22:10.3654346Z #44 0.557 DEBUG Unnecessary package: pygments==2.19.2 2025-09-07T10:22:10.3654886Z #44 0.557 DEBUG Unnecessary package: pyproject-hooks==1.2.0 2025-09-07T10:22:10.3655393Z #44 0.557 DEBUG Unnecessary package: python-dotenv==1.1.1 2025-09-07T10:22:10.3655893Z #44 0.557 DEBUG Unnecessary package: python-json-logger==3.3.0 2025-09-07T10:22:10.3656426Z #44 0.557 DEBUG Unnecessary package: python-multipart==0.0.20 2025-09-07T10:22:10.3656913Z #44 0.557 DEBUG Unnecessary package: pyzmq==27.0.2 2025-09-07T10:22:10.3657378Z #44 0.557 DEBUG Unnecessary package: ray==2.49.1 2025-09-07T10:22:10.3657834Z #44 0.557 DEBUG Unnecessary package: referencing==0.36.2 2025-09-07T10:22:10.3658290Z #44 0.557 DEBUG Unnecessary package: regex==2025.9.1 2025-09-07T10:22:10.3658745Z #44 0.557 DEBUG Unnecessary package: requests==2.32.5 2025-09-07T10:22:10.3659177Z #44 0.557 DEBUG Unnecessary package: rich==14.1.0 2025-09-07T10:22:10.3659639Z #44 0.557 DEBUG Unnecessary package: rich-toolkit==0.15.1 2025-09-07T10:22:10.3660088Z #44 0.557 DEBUG Unnecessary package: rignore==0.6.4 2025-09-07T10:22:10.3660540Z #44 0.557 DEBUG Unnecessary package: rpds-py==0.27.1 2025-09-07T10:22:10.3661007Z #44 0.557 DEBUG Unnecessary package: safetensors==0.6.2 2025-09-07T10:22:10.3661446Z #44 0.557 DEBUG Unnecessary package: scipy==1.16.1 2025-09-07T10:22:10.3661906Z #44 0.557 DEBUG Unnecessary package: sentencepiece==0.2.1 2025-09-07T10:22:10.3662473Z #44 0.557 DEBUG Unnecessary package: sentry-sdk==2.37.0 2025-09-07T10:22:10.3662976Z #44 0.557 DEBUG Unnecessary package: setproctitle==1.3.7 2025-09-07T10:22:10.3663433Z #44 0.557 DEBUG Unnecessary package: shellingham==1.5.4 2025-09-07T10:22:10.3663869Z #44 0.557 DEBUG Unnecessary package: six==1.17.0 2025-09-07T10:22:10.3664279Z #44 0.557 DEBUG Unnecessary package: sniffio==1.3.1 2025-09-07T10:22:10.3664722Z #44 0.557 DEBUG Unnecessary package: soundfile==0.13.1 2025-09-07T10:22:10.3665172Z #44 0.557 DEBUG Unnecessary package: soxr==0.5.0.post1 2025-09-07T10:22:10.3665603Z #44 0.557 DEBUG Unnecessary package: starlette==0.47.3 2025-09-07T10:22:10.3666045Z #44 0.557 DEBUG Unnecessary package: tiktoken==0.11.0 2025-09-07T10:22:10.3666480Z #44 0.557 DEBUG Unnecessary package: tokenizers==0.22.0 2025-09-07T10:22:10.3667418Z #44 0.557 DEBUG Unnecessary package: torchaudio==2.8.0.dev20250901+cu129 (from file:///dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:22:10.3668857Z #44 0.557 DEBUG Unnecessary package: torchvision==0.24.0.dev20250901+cu129 (from file:///dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:22:10.3669799Z #44 0.557 DEBUG Unnecessary package: tqdm==4.67.1 2025-09-07T10:22:10.3670232Z #44 0.557 DEBUG Unnecessary package: transformers==4.56.1 2025-09-07T10:22:10.3670723Z #44 0.557 DEBUG Unnecessary package: triton==3.4.0 2025-09-07T10:22:10.3671137Z #44 0.557 DEBUG Unnecessary package: typer==0.17.4 2025-09-07T10:22:10.3671609Z #44 0.557 DEBUG Unnecessary package: typing-inspection==0.4.1 2025-09-07T10:22:10.3672081Z #44 0.557 DEBUG Unnecessary package: urllib3==2.5.0 2025-09-07T10:22:10.3672516Z #44 0.557 DEBUG Preserving seed package: uv==0.8.4 2025-09-07T10:22:10.3672951Z #44 0.557 DEBUG Unnecessary package: uvicorn==0.35.0 2025-09-07T10:22:10.3673368Z #44 0.557 DEBUG Unnecessary package: uvloop==0.21.0 2025-09-07T10:22:10.3674394Z #44 0.557 DEBUG Unnecessary package: vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129 (from file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-linux_x86_64.whl) 2025-09-07T10:22:10.3675438Z #44 0.557 DEBUG Unnecessary package: watchfiles==1.1.0 2025-09-07T10:22:10.3675929Z #44 0.557 DEBUG Unnecessary package: websockets==15.0.1 2025-09-07T10:22:10.3676372Z #44 0.557 DEBUG Unnecessary package: wheel==0.45.1 2025-09-07T10:22:10.3676795Z #44 0.557 DEBUG Unnecessary package: xgrammar==0.1.23 2025-09-07T10:22:10.3677222Z #44 0.557 DEBUG Unnecessary package: yarl==1.20.1 2025-09-07T10:22:12.6084870Z #44 2.996 Prepared 1 package in 2.43s 2025-09-07T10:22:13.0598400Z #44 3.447 Installed 1 package in 451ms 2025-09-07T10:22:13.0599491Z #44 3.447 + xformers==0.0.33+5d4b92a5.d20250907 (from file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl) 2025-09-07T10:22:13.2100620Z #44 3.447 DEBUG Released lock at `/tmp/uv-281d6a3886c08524.lock` 2025-09-07T10:22:28.7877044Z #44 DONE 19.2s 2025-09-07T10:22:28.9408050Z 2025-09-07T10:22:28.9408575Z #45 [vllm-base 13/18] RUN pip install build==1.3.0 2025-09-07T10:22:29.5705948Z #45 0.780 Requirement already satisfied: build==1.3.0 in /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages (1.3.0) 2025-09-07T10:22:29.7223969Z #45 0.782 Requirement already satisfied: packaging>=19.1 in /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages (from build==1.3.0) (25.0) 2025-09-07T10:22:29.8478185Z #45 0.782 Requirement already satisfied: pyproject_hooks in /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages (from build==1.3.0) (1.2.0) 2025-09-07T10:22:29.8480528Z #45 1.058 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-09-07T10:22:29.9683395Z #45 DONE 1.2s 2025-09-07T10:22:30.1212329Z 2025-09-07T10:22:30.1213188Z #46 [vllm-base 14/18] RUN pip freeze | grep -E 'setuptools|packaging|build' 2025-09-07T10:22:31.0085957Z #46 1.038 build==1.3.0 2025-09-07T10:22:31.0086341Z #46 1.038 packaging==25.0 2025-09-07T10:22:31.0086792Z #46 1.038 setuptools @ file:///dist/setuptools-78.1.0-py3-none-any.whl 2025-09-07T10:22:31.1807572Z #46 DONE 1.1s 2025-09-07T10:22:31.1808070Z 2025-09-07T10:22:31.1813154Z #47 [vllm-base 15/18] RUN --mount=type=cache,target=/root/.cache/uv git clone --depth 1 --recursive --shallow-submodules --branch v0.2.14.post1 https://github.com/flashinfer-ai/flashinfer.git flashinfer && echo "Building FlashInfer with AOT for arches: 8.0;8.9;9.0;10.0;12.0" && cd flashinfer && python3 -m flashinfer.aot && python3 -m build --no-isolation --wheel --outdir ../wheels/flashinfer && cd .. && rm -rf flashinfer 2025-09-07T10:22:31.8275871Z #47 0.798 Cloning into 'flashinfer'... 2025-09-07T10:22:32.2927450Z #47 1.263 Note: switching to '038032209794e4ef4608324723efc979a06d5239'. 2025-09-07T10:22:32.2928333Z #47 1.263 2025-09-07T10:22:32.2928848Z #47 1.263 You are in 'detached HEAD' state. You can look around, make experimental 2025-09-07T10:22:32.2929532Z #47 1.263 changes and commit them, and you can discard any commits you make in this 2025-09-07T10:22:32.2930419Z #47 1.263 state without impacting any branches by switching back to a branch. 2025-09-07T10:22:32.2931046Z #47 1.263 2025-09-07T10:22:32.2931617Z #47 1.263 If you want to create a new branch to retain commits you create, you may 2025-09-07T10:22:32.2932230Z #47 1.263 do so (now or later) by using -c with the switch command. Example: 2025-09-07T10:22:32.2932690Z #47 1.263 2025-09-07T10:22:32.2932965Z #47 1.263 git switch -c 2025-09-07T10:22:32.2933319Z #47 1.263 2025-09-07T10:22:32.2933564Z #47 1.263 Or undo this operation with: 2025-09-07T10:22:32.2933899Z #47 1.263 2025-09-07T10:22:32.2934138Z #47 1.263 git switch - 2025-09-07T10:22:32.2934399Z #47 1.263 2025-09-07T10:22:32.2934826Z #47 1.263 Turn off this advice by setting config variable advice.detachedHead to false 2025-09-07T10:22:32.2935339Z #47 1.263 2025-09-07T10:22:32.4052689Z #47 1.375 Submodule '3rdparty/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path '3rdparty/cutlass' 2025-09-07T10:22:32.5631085Z #47 1.376 Submodule '3rdparty/spdlog' (https://github.com/gabime/spdlog.git) registered for path '3rdparty/spdlog' 2025-09-07T10:22:32.5633211Z #47 1.382 Cloning into '/workspace/flashinfer/3rdparty/cutlass'... 2025-09-07T10:22:34.7872379Z #47 3.757 Cloning into '/workspace/flashinfer/3rdparty/spdlog'... 2025-09-07T10:22:35.6444300Z #47 4.615 From https://github.com/NVIDIA/cutlass 2025-09-07T10:22:35.6445144Z #47 4.615 * branch e51efbfe18fe4f4cbb66ab814c55bf4aa0185491 -> FETCH_HEAD 2025-09-07T10:22:36.3829732Z #47 5.353 Submodule path '3rdparty/cutlass': checked out 'e51efbfe18fe4f4cbb66ab814c55bf4aa0185491' 2025-09-07T10:22:36.7480170Z #47 5.718 From https://github.com/gabime/spdlog 2025-09-07T10:22:36.7480774Z #47 5.718 * branch c3aed4b68373955e1cc94307683d44dca1515d2b -> FETCH_HEAD 2025-09-07T10:22:36.9361418Z #47 5.743 Submodule path '3rdparty/spdlog': checked out 'c3aed4b68373955e1cc94307683d44dca1515d2b' 2025-09-07T10:22:36.9362301Z #47 5.756 Building FlashInfer with AOT for arches: 8.0;8.9;9.0;10.0;12.0 2025-09-07T10:22:41.2451968Z #47 10.22 W0907 10:22:41.243000 204 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/torch/utils/cpp_extension.py:119] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 2025-09-07T10:22:41.9887013Z #47 10.96 AOT build summary: 2025-09-07T10:22:41.9887430Z #47 10.96 out_dir: /workspace/flashinfer/aot-ops 2025-09-07T10:22:41.9887902Z #47 10.96 build_dir: /workspace/flashinfer/build/aot 2025-09-07T10:22:41.9888306Z #47 10.96 fa2_head_dim: [(64, 64), (128, 128)] 2025-09-07T10:22:41.9888688Z #47 10.96 fa3_head_dim: [(192, 128), (128, 128)] 2025-09-07T10:22:41.9889085Z #47 10.96 f16_dtype: [torch.float16, torch.bfloat16] 2025-09-07T10:22:41.9889494Z #47 10.96 f8_dtype: [torch.float8_e4m3fn] 2025-09-07T10:22:41.9890000Z #47 10.96 use_sliding_window: [False] 2025-09-07T10:22:41.9890358Z #47 10.96 use_logits_soft_cap: [False] 2025-09-07T10:22:41.9890867Z #47 10.96 TORCH_CUDA_ARCH_LIST: 8.0;8.9;9.0;10.0;12.0 2025-09-07T10:22:41.9891419Z #47 10.96 has_sm90: True 2025-09-07T10:22:41.9891715Z #47 10.96 has_sm100: True 2025-09-07T10:22:41.9891998Z #47 10.96 add_comm: False 2025-09-07T10:22:41.9892298Z #47 10.96 add_gemma: False 2025-09-07T10:22:41.9892593Z #47 10.96 add_oai_oss: True 2025-09-07T10:22:41.9892896Z #47 10.96 add_moe: False 2025-09-07T10:22:41.9893174Z #47 10.96 add_act: False 2025-09-07T10:22:41.9893467Z #47 10.96 add_misc: True 2025-09-07T10:22:41.9893756Z #47 10.96 Generating JIT specs... 2025-09-07T10:22:41.9894082Z #47 10.96 Total ops: 60 2025-09-07T10:22:42.1418472Z #47 10.96 ninja: Entering directory `/workspace/flashinfer/build/aot/cached_ops' 2025-09-07T10:23:25.0056441Z #47 53.97 [1/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T10:23:34.8325186Z #47 63.80 [2/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T10:23:42.1304068Z #47 71.10 [3/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T10:23:46.0406228Z #47 75.01 [4/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:23:46.8102967Z #47 75.78 [5/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T10:23:47.0806167Z #47 76.05 [6/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T10:23:47.3406162Z #47 76.31 [7/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T10:23:47.5701309Z #47 76.54 [8/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T10:23:47.8013404Z #47 76.77 [9/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T10:23:48.2414223Z #47 77.21 [10/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:23:48.4089454Z #47 77.23 [11/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:23:50.8935830Z #47 79.86 [12/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:23:53.2109108Z #47 82.18 [13/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:23:56.1007288Z #47 85.07 [14/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T10:23:58.1396224Z #47 87.11 [15/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T10:23:59.5496669Z #47 88.52 [16/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T10:24:00.1130259Z #47 89.08 [17/412] c++ -MMD -MF logging/logging.o.d -DTORCH_EXTENSION_NAME=logging -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/3rdparty/spdlog/include -I/workspace/flashinfer/include -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include -fPIC -O3 -std=c++17 -Wno-switch-bool -c /workspace/flashinfer/csrc/logging.cc -o logging/logging.o 2025-09-07T10:24:01.6805710Z #47 90.65 [18/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:24:05.0899534Z #47 94.06 [19/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:24:05.9601208Z #47 94.93 [20/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:24:06.1303269Z #47 95.10 [21/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T10:24:08.3050101Z #47 97.27 [22/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:24:09.7099207Z #47 98.68 [23/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:24:11.0295640Z #47 100.00 [24/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T10:24:12.1118950Z #47 101.1 [25/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:24:12.7998310Z #47 101.8 [26/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:24:13.2507026Z #47 102.2 [27/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:24:27.0407194Z #47 116.0 [28/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:24:31.0604604Z #47 120.0 [29/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T10:24:31.1700233Z #47 120.1 [30/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:24:36.3766674Z #47 125.3 [31/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T10:24:39.7881227Z #47 128.8 [32/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:24:41.0600353Z #47 130.0 [33/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:24:42.3149817Z #47 131.3 [34/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:24:45.9078718Z #47 134.9 [35/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T10:24:51.8203237Z #47 140.8 [36/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:24:53.1095596Z #47 142.1 [37/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:24:54.1410173Z #47 143.1 [38/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:25:01.7581784Z #47 150.7 [39/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:25:03.8200082Z #47 152.8 [40/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:25:04.8932746Z #47 153.9 [41/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:25:08.8733298Z #47 157.8 [42/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T10:25:13.5438351Z #47 162.5 [43/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T10:25:13.9643445Z #47 162.9 [44/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T10:25:15.5947011Z #47 164.6 [45/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:25:15.9394040Z #47 164.9 [46/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T10:25:16.8207440Z #47 165.8 [47/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T10:25:17.1256324Z #47 166.1 [48/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T10:25:18.0402307Z #47 167.0 [49/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T10:25:19.7959443Z #47 168.8 [50/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T10:25:19.9105242Z #47 168.9 [51/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T10:25:20.1091586Z #47 168.9 [52/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T10:25:20.5664239Z #47 169.5 [53/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T10:25:21.4393223Z #47 170.4 [54/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T10:25:21.8108584Z #47 170.8 [55/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T10:25:22.9821261Z #47 172.0 [56/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T10:25:23.1915746Z #47 172.0 [57/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T10:25:23.9195830Z #47 172.9 [58/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:25:25.9334567Z #47 174.9 [59/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T10:25:28.7621077Z #47 177.7 [60/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T10:25:29.3095131Z #47 178.3 [61/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:25:29.7697222Z #47 178.7 [62/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:25:30.7076470Z #47 179.7 [63/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T10:25:34.2033225Z #47 183.2 [64/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:25:38.6994855Z #47 187.7 [65/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:25:41.8592662Z #47 190.8 [66/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T10:25:56.4490487Z #47 205.4 [67/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:25:56.6312267Z #47 205.4 [68/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T10:26:00.0792613Z #47 209.0 [69/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T10:26:04.5096519Z #47 213.5 [70/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:26:05.4797601Z #47 214.4 [71/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T10:26:05.6126770Z #47 214.6 [72/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:26:08.5881869Z #47 217.6 [73/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T10:26:08.8907703Z #47 217.9 [74/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:26:12.5600280Z #47 221.5 [75/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T10:26:15.7722384Z #47 224.7 [76/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:26:17.1866932Z #47 226.2 [77/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T10:26:21.7691665Z #47 230.7 [78/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T10:26:24.8776919Z #47 233.8 [79/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T10:26:25.1994268Z #47 234.2 [80/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:26:25.3972109Z #47 234.4 [81/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T10:26:25.9814925Z #47 234.9 [82/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T10:26:27.6642388Z #47 236.6 [83/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T10:26:31.6240160Z #47 240.6 [84/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:26:32.3196015Z #47 241.3 [85/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:26:32.9813810Z #47 242.0 [86/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T10:26:33.4391405Z #47 242.4 [87/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:26:35.3299013Z #47 244.3 [88/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:26:38.9705083Z #47 247.9 [89/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T10:26:39.9106960Z #47 248.9 [90/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T10:26:42.4393857Z #47 251.4 [91/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:26:44.1537074Z #47 253.1 [92/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:26:46.8588695Z #47 255.8 [93/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:27:10.4709014Z #47 279.4 [94/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T10:27:15.4144081Z #47 284.4 [95/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:27:16.5109774Z #47 285.5 [96/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T10:27:18.5505595Z #47 287.5 [97/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T10:27:19.4710021Z #47 288.4 [98/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T10:27:23.0308483Z #47 292.0 [99/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T10:27:27.9009517Z #47 296.9 [100/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T10:27:29.4505836Z #47 298.4 [101/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T10:27:33.4795500Z #47 302.4 [102/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T10:27:34.2898697Z #47 303.3 [103/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T10:27:35.3037672Z #47 304.3 [104/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T10:27:36.6842708Z #47 305.7 [105/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T10:27:41.0896189Z #47 310.1 [106/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T10:27:41.8405849Z #47 310.8 [107/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T10:27:46.3998676Z #47 315.4 [108/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T10:27:46.8007034Z #47 315.8 [109/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T10:27:47.0467156Z #47 316.0 [110/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T10:27:50.7707172Z #47 319.7 [111/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T10:27:53.2306749Z #47 322.2 [112/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:27:53.3837322Z #47 322.4 [113/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:27:53.5778222Z #47 322.5 [114/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T10:27:53.7211768Z #47 322.6 [115/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T10:27:53.7247902Z #47 322.7 [116/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:27:54.1932107Z #47 323.2 [117/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:27:55.2520661Z #47 324.2 [118/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:27:56.3203470Z #47 325.3 [119/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T10:27:59.1540108Z #47 328.1 [120/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T10:27:59.9008427Z #47 328.9 [121/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T10:28:00.7509258Z #47 329.7 [122/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T10:28:02.4323810Z #47 331.4 [123/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:28:05.7699005Z #47 334.7 [124/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T10:28:06.6295374Z #47 335.6 [125/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:28:09.1100037Z #47 338.1 [126/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:28:10.3699817Z #47 339.3 [127/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T10:28:10.4794639Z #47 339.4 [128/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:28:15.4095984Z #47 344.4 [129/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T10:28:15.5495448Z #47 344.5 [130/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:28:24.4277000Z #47 353.4 [131/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:28:28.4871033Z #47 357.5 [132/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:28:29.6891389Z #47 358.7 [133/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:28:30.5495638Z #47 359.5 [134/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:28:34.6101514Z #47 363.6 [135/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:28:35.6498843Z #47 364.6 [136/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:28:36.2894603Z #47 365.3 [137/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:28:43.2301662Z #47 372.2 [138/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T10:28:46.6304844Z #47 375.6 [139/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T10:28:47.9605290Z #47 376.9 [140/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T10:28:48.0708450Z #47 377.0 [141/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T10:28:48.3732449Z #47 377.3 [142/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T10:28:48.7728166Z #47 377.7 [143/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:28:49.0896425Z #47 378.1 [144/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T10:28:54.0707866Z #47 383.0 [145/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T10:28:54.4286177Z #47 383.4 [146/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T10:29:09.2287955Z #47 398.2 [147/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T10:29:21.5520934Z #47 410.5 [148/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T10:29:25.3718691Z #47 414.3 [149/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T10:29:26.4932633Z #47 415.5 [150/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T10:29:27.2395437Z #47 416.2 [151/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T10:29:30.1385872Z #47 419.1 [152/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T10:29:31.0012283Z #47 420.0 [153/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:29:31.1096265Z #47 420.1 [154/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:29:31.3021777Z #47 420.1 [155/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T10:29:31.7554298Z #47 420.7 [156/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T10:29:33.9129275Z #47 422.9 [157/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:29:35.5244843Z #47 424.5 [158/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T10:29:35.9107255Z #47 424.9 [159/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T10:29:37.3897676Z #47 426.4 [160/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:29:37.8604533Z #47 426.8 [161/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T10:29:38.0303111Z #47 426.8 [162/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:29:39.1492146Z #47 428.1 [163/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:29:40.3105275Z #47 429.3 [164/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T10:29:44.1693239Z #47 433.1 [165/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T10:29:44.3245102Z #47 433.3 [166/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:29:45.4641271Z #47 434.4 [167/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T10:29:45.9795501Z #47 434.9 [168/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:29:46.1790889Z #47 435.1 [169/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T10:29:46.5692872Z #47 435.5 [170/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:29:48.0503927Z #47 437.0 [171/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T10:29:48.1824387Z #47 437.1 [172/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:29:48.1858924Z #47 437.2 [173/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T10:29:48.4172416Z #47 437.4 [174/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:29:48.5287354Z #47 437.5 [175/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T10:29:49.8998720Z #47 438.9 [176/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T10:29:50.4870271Z #47 439.5 [177/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:29:50.7065017Z #47 439.7 [178/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:29:50.9870917Z #47 440.0 [179/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:29:53.1667574Z #47 442.1 [180/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:30:08.8026680Z #47 457.8 [181/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:30:11.3374303Z #47 460.3 [182/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:30:12.7405946Z #47 461.7 [183/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:30:12.9338361Z #47 461.9 [184/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:30:14.0587737Z #47 463.0 [185/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:30:14.8973013Z #47 463.9 [186/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:30:16.7891842Z #47 465.8 [187/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:30:17.7896798Z #47 466.8 [188/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T10:30:18.1704817Z #47 467.1 [189/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:30:19.1908081Z #47 468.2 [190/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:30:20.2988389Z #47 469.3 [191/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T10:30:21.1637364Z #47 470.1 [192/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:30:23.1538812Z #47 472.1 [193/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:30:23.3545013Z #47 472.2 [194/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o 2025-09-07T10:30:23.3563523Z #47 472.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:30:23.3566502Z #47 472.2 bool use_swa = window_left != -1; 2025-09-07T10:30:23.3567127Z #47 472.2 ^ 2025-09-07T10:30:23.3567535Z #47 472.2 2025-09-07T10:30:23.3568245Z #47 472.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:23.3569359Z #47 472.2 2025-09-07T10:30:23.3572207Z #47 472.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:30:23.3575075Z #47 472.2 bool use_swa = window_left != -1; 2025-09-07T10:30:23.3575696Z #47 472.2 ^ 2025-09-07T10:30:23.3576135Z #47 472.2 2025-09-07T10:30:24.0803356Z #47 473.0 [195/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o 2025-09-07T10:30:24.0818957Z #47 473.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:30:24.0821327Z #47 473.0 bool use_swa = window_left != -1; 2025-09-07T10:30:24.0821843Z #47 473.0 ^ 2025-09-07T10:30:24.0822225Z #47 473.0 2025-09-07T10:30:24.0822876Z #47 473.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:24.0823645Z #47 473.0 2025-09-07T10:30:24.0825904Z #47 473.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:30:24.0828210Z #47 473.0 bool use_swa = window_left != -1; 2025-09-07T10:30:24.0828762Z #47 473.0 ^ 2025-09-07T10:30:24.0829170Z #47 473.0 2025-09-07T10:30:26.4326572Z #47 475.4 [196/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T10:30:32.0179901Z #47 481.0 [197/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T10:30:47.0182056Z #47 496.0 [198/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:30:47.1673817Z #47 496.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:47.1676877Z #47 496.0 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:30:47.1677717Z #47 496.0 ^ 2025-09-07T10:30:47.1678218Z #47 496.0 2025-09-07T10:30:47.1678915Z #47 496.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:47.1679797Z #47 496.0 2025-09-07T10:30:47.1682203Z #47 496.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:47.1751277Z #47 496.0 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:30:47.1752006Z #47 496.0 ^ 2025-09-07T10:30:47.1752370Z #47 496.0 2025-09-07T10:30:47.1752939Z #47 496.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:47.1753613Z #47 496.0 2025-09-07T10:30:47.1755705Z #47 496.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:47.1758337Z #47 496.0 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:30:47.1759469Z #47 496.0 ^ 2025-09-07T10:30:47.1759932Z #47 496.0 2025-09-07T10:30:47.1760533Z #47 496.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:47.1761383Z #47 496.0 2025-09-07T10:30:47.1763851Z #47 496.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:47.1766545Z #47 496.0 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:30:47.1767368Z #47 496.0 ^ 2025-09-07T10:30:47.1767825Z #47 496.0 2025-09-07T10:30:47.1768496Z #47 496.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:47.1769610Z #47 496.0 2025-09-07T10:30:47.1772417Z #47 496.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:47.1775129Z #47 496.0 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:30:47.1775816Z #47 496.0 ^ 2025-09-07T10:30:47.1776239Z #47 496.0 2025-09-07T10:30:47.1776925Z #47 496.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:47.1777773Z #47 496.0 2025-09-07T10:30:47.9066476Z #47 496.9 [199/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:30:47.9081666Z #47 496.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:47.9084075Z #47 496.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:30:47.9084951Z #47 496.9 ^ 2025-09-07T10:30:47.9085380Z #47 496.9 2025-09-07T10:30:47.9085984Z #47 496.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:47.9086706Z #47 496.9 2025-09-07T10:30:47.9088806Z #47 496.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:47.9091322Z #47 496.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:30:47.9092035Z #47 496.9 ^ 2025-09-07T10:30:47.9092448Z #47 496.9 2025-09-07T10:30:47.9093045Z #47 496.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:47.9093810Z #47 496.9 2025-09-07T10:30:47.9096116Z #47 496.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:47.9098699Z #47 496.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:30:47.9099400Z #47 496.9 ^ 2025-09-07T10:30:47.9099825Z #47 496.9 2025-09-07T10:30:47.9100449Z #47 496.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:47.9101191Z #47 496.9 2025-09-07T10:30:47.9103259Z #47 496.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:47.9105706Z #47 496.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:30:47.9106401Z #47 496.9 ^ 2025-09-07T10:30:47.9106797Z #47 496.9 2025-09-07T10:30:47.9107421Z #47 496.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:47.9108138Z #47 496.9 2025-09-07T10:30:47.9110268Z #47 496.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:47.9112724Z #47 496.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:30:47.9113408Z #47 496.9 ^ 2025-09-07T10:30:47.9113867Z #47 496.9 2025-09-07T10:30:47.9114461Z #47 496.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:47.9115214Z #47 496.9 2025-09-07T10:30:50.1691894Z #47 499.1 [200/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:30:50.1706463Z #47 499.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:50.1708956Z #47 499.1 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:30:50.1709626Z #47 499.1 ^ 2025-09-07T10:30:50.1709997Z #47 499.1 2025-09-07T10:30:50.1710546Z #47 499.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:50.1711237Z #47 499.1 2025-09-07T10:30:50.1713138Z #47 499.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:50.1715423Z #47 499.1 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:30:50.1716113Z #47 499.1 ^ 2025-09-07T10:30:50.1716475Z #47 499.1 2025-09-07T10:30:50.1717042Z #47 499.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:50.1717741Z #47 499.1 2025-09-07T10:30:50.1719668Z #47 499.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:50.1721941Z #47 499.1 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:30:50.1722605Z #47 499.1 ^ 2025-09-07T10:30:50.1722978Z #47 499.1 2025-09-07T10:30:50.1723540Z #47 499.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:50.1724236Z #47 499.1 2025-09-07T10:30:50.1726146Z #47 499.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:50.1728672Z #47 499.1 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:30:50.1729340Z #47 499.1 ^ 2025-09-07T10:30:50.1729702Z #47 499.1 2025-09-07T10:30:50.1730263Z #47 499.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:50.1731093Z #47 499.1 2025-09-07T10:30:50.1733319Z #47 499.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:50.1735723Z #47 499.1 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:30:50.1736373Z #47 499.1 ^ 2025-09-07T10:30:50.1736748Z #47 499.1 2025-09-07T10:30:50.1737308Z #47 499.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:50.1737997Z #47 499.1 2025-09-07T10:30:50.3972072Z #47 499.4 [201/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:30:50.3986280Z #47 499.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:50.3988589Z #47 499.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:30:50.3989273Z #47 499.4 ^ 2025-09-07T10:30:50.3989632Z #47 499.4 2025-09-07T10:30:50.3990209Z #47 499.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:50.3990914Z #47 499.4 2025-09-07T10:30:50.3992852Z #47 499.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:50.3995363Z #47 499.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:30:50.3996022Z #47 499.4 ^ 2025-09-07T10:30:50.3996390Z #47 499.4 2025-09-07T10:30:50.3996940Z #47 499.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:50.3997639Z #47 499.4 2025-09-07T10:30:50.3999556Z #47 499.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:50.4002353Z #47 499.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:30:50.4003025Z #47 499.4 ^ 2025-09-07T10:30:50.4003393Z #47 499.4 2025-09-07T10:30:50.4003977Z #47 499.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:50.4004653Z #47 499.4 2025-09-07T10:30:50.4006584Z #47 499.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:50.4008850Z #47 499.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:30:50.4009513Z #47 499.4 ^ 2025-09-07T10:30:50.4009889Z #47 499.4 2025-09-07T10:30:50.4010549Z #47 499.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:50.4011485Z #47 499.4 2025-09-07T10:30:50.4013484Z #47 499.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:50.4015781Z #47 499.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:30:50.4016454Z #47 499.4 ^ 2025-09-07T10:30:50.4016815Z #47 499.4 2025-09-07T10:30:50.4017378Z #47 499.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:50.4018055Z #47 499.4 2025-09-07T10:30:51.5005738Z #47 500.5 [202/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o 2025-09-07T10:30:51.5023713Z #47 500.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:30:51.5026326Z #47 500.5 bool use_swa = window_left != -1; 2025-09-07T10:30:51.5027128Z #47 500.5 ^ 2025-09-07T10:30:51.5027510Z #47 500.5 2025-09-07T10:30:51.5028144Z #47 500.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:51.5029010Z #47 500.5 2025-09-07T10:30:51.5031376Z #47 500.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:30:51.5033957Z #47 500.5 bool use_swa = window_left != -1; 2025-09-07T10:30:51.5034545Z #47 500.5 ^ 2025-09-07T10:30:51.5034926Z #47 500.5 2025-09-07T10:30:52.0181928Z #47 501.0 [203/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:30:52.0196046Z #47 501.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:52.0198375Z #47 501.0 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:30:52.0199252Z #47 501.0 ^ 2025-09-07T10:30:52.0199611Z #47 501.0 2025-09-07T10:30:52.0200183Z #47 501.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:52.0200871Z #47 501.0 2025-09-07T10:30:52.0202786Z #47 501.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:52.0205060Z #47 501.0 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:30:52.0205718Z #47 501.0 ^ 2025-09-07T10:30:52.0206090Z #47 501.0 2025-09-07T10:30:52.0206646Z #47 501.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:52.0207501Z #47 501.0 2025-09-07T10:30:52.0209397Z #47 501.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:52.0211819Z #47 501.0 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:30:52.0212543Z #47 501.0 ^ 2025-09-07T10:30:52.0212901Z #47 501.0 2025-09-07T10:30:52.0213462Z #47 501.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:52.0214132Z #47 501.0 2025-09-07T10:30:52.0216149Z #47 501.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:52.0218417Z #47 501.0 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:30:52.0219076Z #47 501.0 ^ 2025-09-07T10:30:52.0219513Z #47 501.0 2025-09-07T10:30:52.0220057Z #47 501.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:52.0220746Z #47 501.0 2025-09-07T10:30:52.0222642Z #47 501.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:52.0224892Z #47 501.0 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:30:52.0225584Z #47 501.0 ^ 2025-09-07T10:30:52.0225942Z #47 501.0 2025-09-07T10:30:52.0226516Z #47 501.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:52.0227194Z #47 501.0 2025-09-07T10:30:52.2035202Z #47 501.2 [204/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T10:30:54.1177640Z #47 503.1 [205/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:30:54.1193911Z #47 503.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:54.1196566Z #47 503.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:30:54.1197400Z #47 503.1 ^ 2025-09-07T10:30:54.1197807Z #47 503.1 2025-09-07T10:30:54.1198444Z #47 503.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:54.1199212Z #47 503.1 2025-09-07T10:30:54.1201293Z #47 503.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:54.1203889Z #47 503.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:30:54.1204893Z #47 503.1 ^ 2025-09-07T10:30:54.1205330Z #47 503.1 2025-09-07T10:30:54.1205959Z #47 503.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:54.1206729Z #47 503.1 2025-09-07T10:30:54.1208852Z #47 503.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:54.1211655Z #47 503.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:30:54.1212478Z #47 503.1 ^ 2025-09-07T10:30:54.1212897Z #47 503.1 2025-09-07T10:30:54.1213542Z #47 503.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:54.1214486Z #47 503.1 2025-09-07T10:30:54.1216559Z #47 503.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:54.1219096Z #47 503.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:30:54.1219885Z #47 503.1 ^ 2025-09-07T10:30:54.1220310Z #47 503.1 2025-09-07T10:30:54.1220944Z #47 503.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:54.1221696Z #47 503.1 2025-09-07T10:30:54.1223996Z #47 503.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:54.1226962Z #47 503.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:30:54.1227850Z #47 503.1 ^ 2025-09-07T10:30:54.1228251Z #47 503.1 2025-09-07T10:30:54.1228879Z #47 503.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:54.1229660Z #47 503.1 2025-09-07T10:30:54.6740920Z #47 503.6 [206/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:30:54.6758430Z #47 503.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:54.6760735Z #47 503.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:30:54.6761418Z #47 503.6 ^ 2025-09-07T10:30:54.6761779Z #47 503.6 2025-09-07T10:30:54.6762366Z #47 503.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:54.6763287Z #47 503.6 2025-09-07T10:30:54.6765200Z #47 503.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:54.6767468Z #47 503.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:30:54.6768122Z #47 503.6 ^ 2025-09-07T10:30:54.6768494Z #47 503.6 2025-09-07T10:30:54.6769038Z #47 503.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:54.6769727Z #47 503.6 2025-09-07T10:30:54.6771949Z #47 503.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:54.6774467Z #47 503.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:30:54.6775142Z #47 503.6 ^ 2025-09-07T10:30:54.6775659Z #47 503.6 2025-09-07T10:30:54.6776237Z #47 503.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:54.6776918Z #47 503.6 2025-09-07T10:30:54.6778820Z #47 503.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:54.6781089Z #47 503.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:30:54.6781759Z #47 503.6 ^ 2025-09-07T10:30:54.6782133Z #47 503.6 2025-09-07T10:30:54.6782720Z #47 503.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:54.6783419Z #47 503.6 2025-09-07T10:30:54.6785309Z #47 503.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:30:54.6787585Z #47 503.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:30:54.6788258Z #47 503.6 ^ 2025-09-07T10:30:54.6788621Z #47 503.6 2025-09-07T10:30:54.6789178Z #47 503.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:30:54.6789854Z #47 503.6 2025-09-07T10:31:03.3610094Z #47 512.3 [207/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:31:03.3630976Z #47 512.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:03.3634324Z #47 512.3 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:03.3635386Z #47 512.3 ^ 2025-09-07T10:31:03.3635894Z #47 512.3 2025-09-07T10:31:03.3636791Z #47 512.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:03.3637768Z #47 512.3 2025-09-07T10:31:03.3640562Z #47 512.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:03.3643978Z #47 512.3 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:03.3645004Z #47 512.3 ^ 2025-09-07T10:31:03.3645482Z #47 512.3 2025-09-07T10:31:03.3646251Z #47 512.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:03.3647245Z #47 512.3 2025-09-07T10:31:03.3650328Z #47 512.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:03.3653879Z #47 512.3 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:03.3654919Z #47 512.3 ^ 2025-09-07T10:31:03.3655357Z #47 512.3 2025-09-07T10:31:03.3655987Z #47 512.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:03.3656771Z #47 512.3 2025-09-07T10:31:03.3659097Z #47 512.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:03.3662736Z #47 512.3 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:03.3663784Z #47 512.3 ^ 2025-09-07T10:31:03.3664302Z #47 512.3 2025-09-07T10:31:03.3665122Z #47 512.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:03.3666115Z #47 512.3 2025-09-07T10:31:03.3668930Z #47 512.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:03.3672403Z #47 512.3 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:03.3673644Z #47 512.3 ^ 2025-09-07T10:31:03.3674144Z #47 512.3 2025-09-07T10:31:03.3674911Z #47 512.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:03.3675883Z #47 512.3 2025-09-07T10:31:11.2067389Z #47 520.2 [208/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T10:31:16.9331933Z #47 525.9 [209/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o 2025-09-07T10:31:16.9355317Z #47 525.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:31:16.9358532Z #47 525.9 bool use_swa = window_left != -1; 2025-09-07T10:31:16.9359220Z #47 525.9 ^ 2025-09-07T10:31:16.9359695Z #47 525.9 2025-09-07T10:31:16.9360525Z #47 525.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:16.9361553Z #47 525.9 2025-09-07T10:31:16.9364668Z #47 525.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:31:16.9367801Z #47 525.9 bool use_swa = window_left != -1; 2025-09-07T10:31:16.9368360Z #47 525.9 ^ 2025-09-07T10:31:16.9368822Z #47 525.9 2025-09-07T10:31:18.5915557Z #47 527.6 [210/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T10:31:18.7813334Z #47 527.7 [211/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:31:18.7834657Z #47 527.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:18.7838162Z #47 527.7 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:31:18.7839137Z #47 527.7 ^ 2025-09-07T10:31:18.7839664Z #47 527.7 2025-09-07T10:31:18.7840454Z #47 527.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:18.7841486Z #47 527.7 2025-09-07T10:31:18.7844311Z #47 527.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:18.7847708Z #47 527.7 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:31:18.7848676Z #47 527.7 ^ 2025-09-07T10:31:18.7849480Z #47 527.7 2025-09-07T10:31:18.7850253Z #47 527.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:18.7851316Z #47 527.7 2025-09-07T10:31:18.7853989Z #47 527.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:18.7857377Z #47 527.7 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:31:18.7858335Z #47 527.7 ^ 2025-09-07T10:31:18.7858854Z #47 527.7 2025-09-07T10:31:18.7859669Z #47 527.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:18.7860920Z #47 527.7 2025-09-07T10:31:18.7863751Z #47 527.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:18.7866968Z #47 527.7 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:31:18.7867933Z #47 527.7 ^ 2025-09-07T10:31:18.7868445Z #47 527.7 2025-09-07T10:31:18.7869278Z #47 527.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:18.7870331Z #47 527.7 2025-09-07T10:31:18.7873124Z #47 527.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:18.7876782Z #47 527.7 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:31:18.7877757Z #47 527.7 ^ 2025-09-07T10:31:18.7878281Z #47 527.7 2025-09-07T10:31:18.7879036Z #47 527.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:18.7879994Z #47 527.7 2025-09-07T10:31:20.3847032Z #47 529.4 [212/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:31:20.3861428Z #47 529.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:20.3863702Z #47 529.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:31:20.3864368Z #47 529.4 ^ 2025-09-07T10:31:20.3864727Z #47 529.4 2025-09-07T10:31:20.3865309Z #47 529.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:20.3866220Z #47 529.4 2025-09-07T10:31:20.3868132Z #47 529.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:20.3870603Z #47 529.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:31:20.3871257Z #47 529.4 ^ 2025-09-07T10:31:20.3871638Z #47 529.4 2025-09-07T10:31:20.3872209Z #47 529.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:20.3872899Z #47 529.4 2025-09-07T10:31:20.3874859Z #47 529.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:20.3877316Z #47 529.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:31:20.3877978Z #47 529.4 ^ 2025-09-07T10:31:20.3878356Z #47 529.4 2025-09-07T10:31:20.3878909Z #47 529.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:20.3879611Z #47 529.4 2025-09-07T10:31:20.3881506Z #47 529.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:20.3883773Z #47 529.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:31:20.3884571Z #47 529.4 ^ 2025-09-07T10:31:20.3884947Z #47 529.4 2025-09-07T10:31:20.3885540Z #47 529.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:20.3886220Z #47 529.4 2025-09-07T10:31:20.3888258Z #47 529.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:20.3890519Z #47 529.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:31:20.3891333Z #47 529.4 ^ 2025-09-07T10:31:20.3891710Z #47 529.4 2025-09-07T10:31:20.3892261Z #47 529.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:20.3892957Z #47 529.4 2025-09-07T10:31:22.1183810Z #47 531.1 [213/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o 2025-09-07T10:31:22.1202117Z #47 531.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cu(115): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:31:22.1205738Z #47 531.1 bool use_swa = window_left != -1; 2025-09-07T10:31:22.1206488Z #47 531.1 ^ 2025-09-07T10:31:22.1207013Z #47 531.1 2025-09-07T10:31:22.1207830Z #47 531.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:22.1208794Z #47 531.1 2025-09-07T10:31:22.2791788Z #47 531.2 [214/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:31:22.2809757Z #47 531.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:22.2812948Z #47 531.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:31:22.2813816Z #47 531.2 ^ 2025-09-07T10:31:22.2814273Z #47 531.2 2025-09-07T10:31:22.2814978Z #47 531.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:22.2815869Z #47 531.2 2025-09-07T10:31:22.2818236Z #47 531.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:22.2821470Z #47 531.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:31:22.2822305Z #47 531.2 ^ 2025-09-07T10:31:22.2822797Z #47 531.2 2025-09-07T10:31:22.2823514Z #47 531.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:22.2824398Z #47 531.2 2025-09-07T10:31:22.2826908Z #47 531.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:22.2830065Z #47 531.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:31:22.2830935Z #47 531.2 ^ 2025-09-07T10:31:22.2831417Z #47 531.2 2025-09-07T10:31:22.2832142Z #47 531.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:22.2833028Z #47 531.2 2025-09-07T10:31:22.2835515Z #47 531.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:22.2838503Z #47 531.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:31:22.2839370Z #47 531.2 ^ 2025-09-07T10:31:22.2839841Z #47 531.2 2025-09-07T10:31:22.2840701Z #47 531.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:22.2841607Z #47 531.2 2025-09-07T10:31:22.2844194Z #47 531.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:22.2847151Z #47 531.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:31:22.2848019Z #47 531.2 ^ 2025-09-07T10:31:22.2848477Z #47 531.2 2025-09-07T10:31:22.2849449Z #47 531.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:22.2850352Z #47 531.2 2025-09-07T10:31:22.9327727Z #47 531.9 [215/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:31:22.9346631Z #47 531.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:22.9350355Z #47 531.9 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:31:22.9351371Z #47 531.9 ^ 2025-09-07T10:31:22.9351964Z #47 531.9 2025-09-07T10:31:22.9352802Z #47 531.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:22.9353822Z #47 531.9 2025-09-07T10:31:22.9356422Z #47 531.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:22.9359494Z #47 531.9 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:31:22.9360514Z #47 531.9 ^ 2025-09-07T10:31:22.9361077Z #47 531.9 2025-09-07T10:31:22.9362128Z #47 531.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:22.9363201Z #47 531.9 2025-09-07T10:31:22.9365971Z #47 531.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:22.9369081Z #47 531.9 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:31:22.9370074Z #47 531.9 ^ 2025-09-07T10:31:22.9370656Z #47 531.9 2025-09-07T10:31:22.9371619Z #47 531.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:22.9372594Z #47 531.9 2025-09-07T10:31:22.9375164Z #47 531.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:22.9378286Z #47 531.9 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:31:22.9379263Z #47 531.9 ^ 2025-09-07T10:31:22.9379856Z #47 531.9 2025-09-07T10:31:22.9380653Z #47 531.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:22.9381644Z #47 531.9 2025-09-07T10:31:22.9384221Z #47 531.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:22.9387377Z #47 531.9 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:31:22.9388375Z #47 531.9 ^ 2025-09-07T10:31:22.9388957Z #47 531.9 2025-09-07T10:31:22.9389958Z #47 531.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:22.9390956Z #47 531.9 2025-09-07T10:31:24.2381658Z #47 533.2 [216/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:31:24.2401348Z #47 533.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:24.2404398Z #47 533.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:31:24.2405267Z #47 533.2 ^ 2025-09-07T10:31:24.2405747Z #47 533.2 2025-09-07T10:31:24.2406449Z #47 533.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:24.2407336Z #47 533.2 2025-09-07T10:31:24.2409797Z #47 533.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:24.2413109Z #47 533.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:31:24.2413994Z #47 533.2 ^ 2025-09-07T10:31:24.2414454Z #47 533.2 2025-09-07T10:31:24.2415199Z #47 533.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:24.2416128Z #47 533.2 2025-09-07T10:31:24.2418815Z #47 533.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:24.2422007Z #47 533.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:31:24.2423431Z #47 533.2 ^ 2025-09-07T10:31:24.2423918Z #47 533.2 2025-09-07T10:31:24.2424671Z #47 533.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:24.2425850Z #47 533.2 2025-09-07T10:31:24.2428583Z #47 533.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:24.2431766Z #47 533.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:31:24.2432680Z #47 533.2 ^ 2025-09-07T10:31:24.2433179Z #47 533.2 2025-09-07T10:31:24.2433979Z #47 533.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:24.2434916Z #47 533.2 2025-09-07T10:31:24.2437571Z #47 533.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:24.2440918Z #47 533.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:31:24.2441805Z #47 533.2 ^ 2025-09-07T10:31:24.2442289Z #47 533.2 2025-09-07T10:31:24.2443006Z #47 533.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:24.2443939Z #47 533.2 2025-09-07T10:31:24.7962458Z #47 533.8 [217/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:31:24.7982227Z #47 533.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:24.7985539Z #47 533.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:24.7986535Z #47 533.8 ^ 2025-09-07T10:31:24.7987054Z #47 533.8 2025-09-07T10:31:24.7987829Z #47 533.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:24.7989008Z #47 533.8 2025-09-07T10:31:24.7991681Z #47 533.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:24.7994993Z #47 533.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:24.7996021Z #47 533.8 ^ 2025-09-07T10:31:24.7996549Z #47 533.8 2025-09-07T10:31:24.7997301Z #47 533.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:24.7998261Z #47 533.8 2025-09-07T10:31:24.8000960Z #47 533.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:24.8004499Z #47 533.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:24.8005512Z #47 533.8 ^ 2025-09-07T10:31:24.8006008Z #47 533.8 2025-09-07T10:31:24.8006785Z #47 533.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:24.8007724Z #47 533.8 2025-09-07T10:31:24.8010292Z #47 533.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:24.8013898Z #47 533.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:24.8014954Z #47 533.8 ^ 2025-09-07T10:31:24.8015453Z #47 533.8 2025-09-07T10:31:24.8016213Z #47 533.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:24.8017183Z #47 533.8 2025-09-07T10:31:24.8019961Z #47 533.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:24.8023305Z #47 533.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:24.8024314Z #47 533.8 ^ 2025-09-07T10:31:24.8024799Z #47 533.8 2025-09-07T10:31:24.8025569Z #47 533.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:24.8026504Z #47 533.8 2025-09-07T10:31:26.6801862Z #47 535.6 [218/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:31:26.6818572Z #47 535.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:26.6821543Z #47 535.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:31:26.6822317Z #47 535.6 ^ 2025-09-07T10:31:26.6822721Z #47 535.6 2025-09-07T10:31:26.6823365Z #47 535.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:26.6824164Z #47 535.6 2025-09-07T10:31:26.6826393Z #47 535.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:26.6829256Z #47 535.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:31:26.6830049Z #47 535.6 ^ 2025-09-07T10:31:26.6830476Z #47 535.6 2025-09-07T10:31:26.6831105Z #47 535.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:26.6831899Z #47 535.6 2025-09-07T10:31:26.6834270Z #47 535.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:26.6836913Z #47 535.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:31:26.6837853Z #47 535.6 ^ 2025-09-07T10:31:26.6838344Z #47 535.6 2025-09-07T10:31:26.6853893Z #47 535.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:26.6854891Z #47 535.6 2025-09-07T10:31:26.6857169Z #47 535.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:26.6859849Z #47 535.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:31:26.6860618Z #47 535.6 ^ 2025-09-07T10:31:26.6861045Z #47 535.6 2025-09-07T10:31:26.6861691Z #47 535.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:26.6862523Z #47 535.6 2025-09-07T10:31:26.6864768Z #47 535.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:26.6867425Z #47 535.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:31:26.6868421Z #47 535.6 ^ 2025-09-07T10:31:26.6868837Z #47 535.6 2025-09-07T10:31:26.6869503Z #47 535.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:26.6870295Z #47 535.6 2025-09-07T10:31:34.3722872Z #47 543.3 [219/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:31:34.3743160Z #47 543.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:34.3746637Z #47 543.3 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:34.3747658Z #47 543.3 ^ 2025-09-07T10:31:34.3748154Z #47 543.3 2025-09-07T10:31:34.3749176Z #47 543.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:34.3750096Z #47 543.3 2025-09-07T10:31:34.3752757Z #47 543.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:34.3756180Z #47 543.3 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:34.3757272Z #47 543.3 ^ 2025-09-07T10:31:34.3757813Z #47 543.3 2025-09-07T10:31:34.3758630Z #47 543.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:34.3759659Z #47 543.3 2025-09-07T10:31:34.3762171Z #47 543.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:34.3765557Z #47 543.3 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:34.3766861Z #47 543.3 ^ 2025-09-07T10:31:34.3767364Z #47 543.3 2025-09-07T10:31:34.3768155Z #47 543.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:34.3769104Z #47 543.3 2025-09-07T10:31:34.3771932Z #47 543.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:34.3775491Z #47 543.3 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:34.3776573Z #47 543.3 ^ 2025-09-07T10:31:34.3777119Z #47 543.3 2025-09-07T10:31:34.3777872Z #47 543.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:34.3779000Z #47 543.3 2025-09-07T10:31:34.3781801Z #47 543.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:31:34.3785354Z #47 543.3 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:31:34.3786448Z #47 543.3 ^ 2025-09-07T10:31:34.3786901Z #47 543.3 2025-09-07T10:31:34.3787559Z #47 543.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:34.3788458Z #47 543.3 2025-09-07T10:31:35.5903445Z #47 544.6 [220/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o 2025-09-07T10:31:35.5919945Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.5939297Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.5956820Z #47 544.6 ^ 2025-09-07T10:31:35.5957261Z #47 544.6 2025-09-07T10:31:35.5957877Z #47 544.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:35.5958670Z #47 544.6 2025-09-07T10:31:35.5960632Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.5980860Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6000515Z #47 544.6 ^ 2025-09-07T10:31:35.6001011Z #47 544.6 2025-09-07T10:31:35.6003550Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6025496Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6045588Z #47 544.6 ^ 2025-09-07T10:31:35.6046098Z #47 544.6 2025-09-07T10:31:35.6048543Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6068960Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6086440Z #47 544.6 ^ 2025-09-07T10:31:35.6086895Z #47 544.6 2025-09-07T10:31:35.6088877Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6108444Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6126593Z #47 544.6 ^ 2025-09-07T10:31:35.6127036Z #47 544.6 2025-09-07T10:31:35.6129053Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6186172Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6205613Z #47 544.6 ^ 2025-09-07T10:31:35.6206071Z #47 544.6 2025-09-07T10:31:35.6208352Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6228708Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6246825Z #47 544.6 ^ 2025-09-07T10:31:35.6247282Z #47 544.6 2025-09-07T10:31:35.6251446Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6275643Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6295171Z #47 544.6 ^ 2025-09-07T10:31:35.6295627Z #47 544.6 2025-09-07T10:31:35.6297765Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6318877Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6337218Z #47 544.6 ^ 2025-09-07T10:31:35.6337686Z #47 544.6 2025-09-07T10:31:35.6338358Z #47 544.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:35.6339145Z #47 544.6 2025-09-07T10:31:35.6341208Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6360775Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6380254Z #47 544.6 ^ 2025-09-07T10:31:35.6380730Z #47 544.6 2025-09-07T10:31:35.6383009Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6404231Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6424995Z #47 544.6 ^ 2025-09-07T10:31:35.6425628Z #47 544.6 2025-09-07T10:31:35.6427895Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6447365Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6465015Z #47 544.6 ^ 2025-09-07T10:31:35.6465444Z #47 544.6 2025-09-07T10:31:35.6467652Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6487219Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6505324Z #47 544.6 ^ 2025-09-07T10:31:35.6505779Z #47 544.6 2025-09-07T10:31:35.6508099Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6528317Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6547216Z #47 544.6 ^ 2025-09-07T10:31:35.6547650Z #47 544.6 2025-09-07T10:31:35.6550144Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6569203Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6587423Z #47 544.6 ^ 2025-09-07T10:31:35.6587858Z #47 544.6 2025-09-07T10:31:35.6589861Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6609789Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6629230Z #47 544.6 ^ 2025-09-07T10:31:35.6629688Z #47 544.6 2025-09-07T10:31:35.6631993Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6652839Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6671763Z #47 544.6 ^ 2025-09-07T10:31:35.6672226Z #47 544.6 2025-09-07T10:31:35.6672930Z #47 544.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:35.6673847Z #47 544.6 2025-09-07T10:31:35.6676202Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6699113Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6719878Z #47 544.6 ^ 2025-09-07T10:31:35.6720347Z #47 544.6 2025-09-07T10:31:35.6722546Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6743518Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6762224Z #47 544.6 ^ 2025-09-07T10:31:35.6762657Z #47 544.6 2025-09-07T10:31:35.6764711Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6784692Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6802838Z #47 544.6 ^ 2025-09-07T10:31:35.6803316Z #47 544.6 2025-09-07T10:31:35.6805659Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6827138Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6846403Z #47 544.6 ^ 2025-09-07T10:31:35.6846875Z #47 544.6 2025-09-07T10:31:35.6849382Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6871489Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6891602Z #47 544.6 ^ 2025-09-07T10:31:35.6892071Z #47 544.6 2025-09-07T10:31:35.6894333Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6915209Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6935493Z #47 544.6 ^ 2025-09-07T10:31:35.6935941Z #47 544.6 2025-09-07T10:31:35.6938084Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.6959051Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.6978957Z #47 544.6 ^ 2025-09-07T10:31:35.6979413Z #47 544.6 2025-09-07T10:31:35.6981601Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7001862Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7021227Z #47 544.6 ^ 2025-09-07T10:31:35.7021692Z #47 544.6 2025-09-07T10:31:35.7022370Z #47 544.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:35.7023185Z #47 544.6 2025-09-07T10:31:35.7025329Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7044996Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7063951Z #47 544.6 ^ 2025-09-07T10:31:35.7064394Z #47 544.6 2025-09-07T10:31:35.7066698Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7086264Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7104636Z #47 544.6 ^ 2025-09-07T10:31:35.7105051Z #47 544.6 2025-09-07T10:31:35.7107238Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7126210Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7144219Z #47 544.6 ^ 2025-09-07T10:31:35.7144660Z #47 544.6 2025-09-07T10:31:35.7146751Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7168800Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7187171Z #47 544.6 ^ 2025-09-07T10:31:35.7187571Z #47 544.6 2025-09-07T10:31:35.7189700Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7209717Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7228331Z #47 544.6 ^ 2025-09-07T10:31:35.7228749Z #47 544.6 2025-09-07T10:31:35.7230703Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7249812Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7268369Z #47 544.6 ^ 2025-09-07T10:31:35.7268795Z #47 544.6 2025-09-07T10:31:35.7270920Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7289859Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7309353Z #47 544.6 ^ 2025-09-07T10:31:35.7309769Z #47 544.6 2025-09-07T10:31:35.7311746Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7330881Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7347818Z #47 544.6 ^ 2025-09-07T10:31:35.7348232Z #47 544.6 2025-09-07T10:31:35.7349102Z #47 544.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:35.7350077Z #47 544.6 2025-09-07T10:31:35.7352118Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7373638Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7390599Z #47 544.6 ^ 2025-09-07T10:31:35.7391031Z #47 544.6 2025-09-07T10:31:35.7392969Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7411352Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7428539Z #47 544.6 ^ 2025-09-07T10:31:35.7428995Z #47 544.6 2025-09-07T10:31:35.7431204Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7450232Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7467887Z #47 544.6 ^ 2025-09-07T10:31:35.7468323Z #47 544.6 2025-09-07T10:31:35.7470311Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7489804Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7507792Z #47 544.6 ^ 2025-09-07T10:31:35.7508207Z #47 544.6 2025-09-07T10:31:35.7510187Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7529706Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7547278Z #47 544.6 ^ 2025-09-07T10:31:35.7547695Z #47 544.6 2025-09-07T10:31:35.7549972Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7567991Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7584442Z #47 544.6 ^ 2025-09-07T10:31:35.7584844Z #47 544.6 2025-09-07T10:31:35.7586878Z #47 544.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:35.7604469Z #47 544.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:35.7621000Z #47 544.6 ^ 2025-09-07T10:31:35.7621394Z #47 544.6 2025-09-07T10:31:39.6033578Z #47 548.6 [221/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:31:47.7642738Z #47 556.7 [222/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o 2025-09-07T10:31:47.7658542Z #47 556.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cu(115): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:31:47.7660671Z #47 556.7 bool use_swa = window_left != -1; 2025-09-07T10:31:47.7661145Z #47 556.7 ^ 2025-09-07T10:31:47.7661467Z #47 556.7 2025-09-07T10:31:47.7662041Z #47 556.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:47.7662739Z #47 556.7 2025-09-07T10:31:50.7323336Z #47 559.7 [223/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T10:31:52.5581641Z #47 561.5 [224/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:31:56.1299444Z #47 565.1 [225/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:31:57.5239273Z #47 566.5 [226/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:31:57.7632581Z #47 566.7 [227/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:31:58.3402680Z #47 567.3 [228/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o 2025-09-07T10:31:58.4900685Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.4923396Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.4943843Z #47 567.3 ^ 2025-09-07T10:31:58.4944343Z #47 567.3 2025-09-07T10:31:58.4945047Z #47 567.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:58.4945931Z #47 567.3 2025-09-07T10:31:58.4948273Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.4970506Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.4990690Z #47 567.3 ^ 2025-09-07T10:31:58.4991196Z #47 567.3 2025-09-07T10:31:58.4993591Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5015606Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5036643Z #47 567.3 ^ 2025-09-07T10:31:58.5037168Z #47 567.3 2025-09-07T10:31:58.5039611Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5061709Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5081746Z #47 567.3 ^ 2025-09-07T10:31:58.5082230Z #47 567.3 2025-09-07T10:31:58.5084606Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5107291Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5127928Z #47 567.3 ^ 2025-09-07T10:31:58.5128522Z #47 567.3 2025-09-07T10:31:58.5130845Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5155206Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5177722Z #47 567.3 ^ 2025-09-07T10:31:58.5178225Z #47 567.3 2025-09-07T10:31:58.5180515Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5202683Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5223939Z #47 567.3 ^ 2025-09-07T10:31:58.5224458Z #47 567.3 2025-09-07T10:31:58.5226834Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5249501Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5270099Z #47 567.3 ^ 2025-09-07T10:31:58.5270624Z #47 567.3 2025-09-07T10:31:58.5273038Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5295265Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5315114Z #47 567.3 ^ 2025-09-07T10:31:58.5315543Z #47 567.3 2025-09-07T10:31:58.5316261Z #47 567.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:58.5317120Z #47 567.3 2025-09-07T10:31:58.5319039Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5341195Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5361983Z #47 567.3 ^ 2025-09-07T10:31:58.5362499Z #47 567.3 2025-09-07T10:31:58.5364922Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5387674Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5407998Z #47 567.3 ^ 2025-09-07T10:31:58.5408522Z #47 567.3 2025-09-07T10:31:58.5411090Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5432624Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5453266Z #47 567.3 ^ 2025-09-07T10:31:58.5453722Z #47 567.3 2025-09-07T10:31:58.5456006Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5478355Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5499629Z #47 567.3 ^ 2025-09-07T10:31:58.5500114Z #47 567.3 2025-09-07T10:31:58.5502475Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5525454Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5546217Z #47 567.3 ^ 2025-09-07T10:31:58.5546730Z #47 567.3 2025-09-07T10:31:58.5549439Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5572505Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5593987Z #47 567.3 ^ 2025-09-07T10:31:58.5594515Z #47 567.3 2025-09-07T10:31:58.5597031Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5619687Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5640741Z #47 567.3 ^ 2025-09-07T10:31:58.5641230Z #47 567.3 2025-09-07T10:31:58.5643797Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5664262Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5683992Z #47 567.3 ^ 2025-09-07T10:31:58.5684432Z #47 567.3 2025-09-07T10:31:58.5685224Z #47 567.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:58.5686068Z #47 567.3 2025-09-07T10:31:58.5688712Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5710957Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5730079Z #47 567.3 ^ 2025-09-07T10:31:58.5730579Z #47 567.3 2025-09-07T10:31:58.5733125Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5754867Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5775758Z #47 567.3 ^ 2025-09-07T10:31:58.5776264Z #47 567.3 2025-09-07T10:31:58.5778724Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5801839Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5823546Z #47 567.3 ^ 2025-09-07T10:31:58.5824256Z #47 567.3 2025-09-07T10:31:58.5826825Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5850545Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5871175Z #47 567.3 ^ 2025-09-07T10:31:58.5871909Z #47 567.3 2025-09-07T10:31:58.5874152Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5896109Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5916402Z #47 567.3 ^ 2025-09-07T10:31:58.5916882Z #47 567.3 2025-09-07T10:31:58.5919245Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5940996Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.5962213Z #47 567.3 ^ 2025-09-07T10:31:58.5962746Z #47 567.3 2025-09-07T10:31:58.5965104Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.5988743Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6009442Z #47 567.3 ^ 2025-09-07T10:31:58.6009933Z #47 567.3 2025-09-07T10:31:58.6012729Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6033984Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6051941Z #47 567.3 ^ 2025-09-07T10:31:58.6052341Z #47 567.3 2025-09-07T10:31:58.6052967Z #47 567.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:58.6053673Z #47 567.3 2025-09-07T10:31:58.6055428Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6076111Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6096549Z #47 567.3 ^ 2025-09-07T10:31:58.6097069Z #47 567.3 2025-09-07T10:31:58.6099420Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6121078Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6141445Z #47 567.3 ^ 2025-09-07T10:31:58.6141950Z #47 567.3 2025-09-07T10:31:58.6144339Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6167236Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6187907Z #47 567.3 ^ 2025-09-07T10:31:58.6188443Z #47 567.3 2025-09-07T10:31:58.6190803Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6209530Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6227036Z #47 567.3 ^ 2025-09-07T10:31:58.6227470Z #47 567.3 2025-09-07T10:31:58.6229392Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6250577Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6270089Z #47 567.3 ^ 2025-09-07T10:31:58.6270504Z #47 567.3 2025-09-07T10:31:58.6272398Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6293248Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6314493Z #47 567.3 ^ 2025-09-07T10:31:58.6315039Z #47 567.3 2025-09-07T10:31:58.6317533Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6338525Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6356190Z #47 567.3 ^ 2025-09-07T10:31:58.6356635Z #47 567.3 2025-09-07T10:31:58.6358602Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6378883Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6398185Z #47 567.3 ^ 2025-09-07T10:31:58.6398651Z #47 567.3 2025-09-07T10:31:58.6399296Z #47 567.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:31:58.6400125Z #47 567.3 2025-09-07T10:31:58.6402499Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6424370Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6445583Z #47 567.3 ^ 2025-09-07T10:31:58.6446136Z #47 567.3 2025-09-07T10:31:58.6448557Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6471063Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6487061Z #47 567.3 ^ 2025-09-07T10:31:58.6487490Z #47 567.3 2025-09-07T10:31:58.6489290Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6509340Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6529606Z #47 567.3 ^ 2025-09-07T10:31:58.6530092Z #47 567.3 2025-09-07T10:31:58.6532457Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6555756Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6577134Z #47 567.3 ^ 2025-09-07T10:31:58.6577676Z #47 567.3 2025-09-07T10:31:58.6580321Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6603005Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6623970Z #47 567.3 ^ 2025-09-07T10:31:58.6624477Z #47 567.3 2025-09-07T10:31:58.6626727Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6645283Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6662929Z #47 567.3 ^ 2025-09-07T10:31:58.6663373Z #47 567.3 2025-09-07T10:31:58.6665417Z #47 567.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:31:58.6686715Z #47 567.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:31:58.6706293Z #47 567.3 ^ 2025-09-07T10:31:58.6706818Z #47 567.3 2025-09-07T10:31:59.3630564Z #47 568.3 [229/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:32:02.7382283Z #47 571.7 [230/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:32:04.0404246Z #47 573.0 [231/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:32:05.4561656Z #47 574.4 [232/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:32:06.0601350Z #47 575.0 [233/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:32:14.7629600Z #47 583.7 [234/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:32:23.6231445Z #47 592.6 [235/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:32:26.1409670Z #47 595.1 [236/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:32:26.1430012Z #47 595.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:26.1433314Z #47 595.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:32:26.1434247Z #47 595.1 ^ 2025-09-07T10:32:26.1434755Z #47 595.1 2025-09-07T10:32:26.1435535Z #47 595.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:26.1436503Z #47 595.1 2025-09-07T10:32:26.1439259Z #47 595.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:26.1442527Z #47 595.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:32:26.1443441Z #47 595.1 ^ 2025-09-07T10:32:26.1443938Z #47 595.1 2025-09-07T10:32:26.1444719Z #47 595.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:26.1445685Z #47 595.1 2025-09-07T10:32:26.1448360Z #47 595.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:26.1452058Z #47 595.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:32:26.1452875Z #47 595.1 ^ 2025-09-07T10:32:26.1453339Z #47 595.1 2025-09-07T10:32:26.1454035Z #47 595.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:26.1454887Z #47 595.1 2025-09-07T10:32:26.1457346Z #47 595.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:26.1460490Z #47 595.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:32:26.1461390Z #47 595.1 ^ 2025-09-07T10:32:26.1461880Z #47 595.1 2025-09-07T10:32:26.1462631Z #47 595.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:26.1463794Z #47 595.1 2025-09-07T10:32:26.1466481Z #47 595.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:26.1469391Z #47 595.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:32:26.1470222Z #47 595.1 ^ 2025-09-07T10:32:26.1470674Z #47 595.1 2025-09-07T10:32:26.1471414Z #47 595.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:26.1472364Z #47 595.1 2025-09-07T10:32:26.6477583Z #47 595.6 [237/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:32:28.9112320Z #47 597.9 [238/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:32:28.9134965Z #47 597.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:28.9138405Z #47 597.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:32:28.9139300Z #47 597.9 ^ 2025-09-07T10:32:28.9139780Z #47 597.9 2025-09-07T10:32:28.9140515Z #47 597.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:28.9141424Z #47 597.9 2025-09-07T10:32:28.9144267Z #47 597.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:28.9147615Z #47 597.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:32:28.9148517Z #47 597.9 ^ 2025-09-07T10:32:28.9149189Z #47 597.9 2025-09-07T10:32:28.9149835Z #47 597.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:28.9150710Z #47 597.9 2025-09-07T10:32:28.9153318Z #47 597.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:28.9156501Z #47 597.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:32:28.9157390Z #47 597.9 ^ 2025-09-07T10:32:28.9157900Z #47 597.9 2025-09-07T10:32:28.9158646Z #47 597.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:28.9159595Z #47 597.9 2025-09-07T10:32:28.9162307Z #47 597.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:28.9165774Z #47 597.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:32:28.9166666Z #47 597.9 ^ 2025-09-07T10:32:28.9167121Z #47 597.9 2025-09-07T10:32:28.9167846Z #47 597.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:28.9168721Z #47 597.9 2025-09-07T10:32:28.9171077Z #47 597.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:28.9173907Z #47 597.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:32:28.9174722Z #47 597.9 ^ 2025-09-07T10:32:28.9175208Z #47 597.9 2025-09-07T10:32:28.9175899Z #47 597.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:28.9177018Z #47 597.9 2025-09-07T10:32:29.2555070Z #47 598.2 [239/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:32:31.0624926Z #47 600.0 [240/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:32:32.4500705Z #47 601.4 [241/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:32:32.4515692Z #47 601.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:32.4518036Z #47 601.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:32:32.4518728Z #47 601.4 ^ 2025-09-07T10:32:32.4519093Z #47 601.4 2025-09-07T10:32:32.4519661Z #47 601.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:32.4520367Z #47 601.4 2025-09-07T10:32:32.4522296Z #47 601.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:32.4524651Z #47 601.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:32:32.4525325Z #47 601.4 ^ 2025-09-07T10:32:32.4525998Z #47 601.4 2025-09-07T10:32:32.4526572Z #47 601.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:32.4527250Z #47 601.4 2025-09-07T10:32:32.4529215Z #47 601.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:32.4531607Z #47 601.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:32:32.4532297Z #47 601.4 ^ 2025-09-07T10:32:32.4532676Z #47 601.4 2025-09-07T10:32:32.4533219Z #47 601.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:32.4533912Z #47 601.4 2025-09-07T10:32:32.4535873Z #47 601.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:32.4538334Z #47 601.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:32:32.4539006Z #47 601.4 ^ 2025-09-07T10:32:32.4539360Z #47 601.4 2025-09-07T10:32:32.4539932Z #47 601.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:32.4540607Z #47 601.4 2025-09-07T10:32:32.4542632Z #47 601.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:32.4544894Z #47 601.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:32:32.4545570Z #47 601.4 ^ 2025-09-07T10:32:32.4545939Z #47 601.4 2025-09-07T10:32:32.4546558Z #47 601.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:32.4547250Z #47 601.4 2025-09-07T10:32:32.6797997Z #47 601.6 [242/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:32:32.6817374Z #47 601.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:32.6820179Z #47 601.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:32:32.6821096Z #47 601.6 ^ 2025-09-07T10:32:32.6821528Z #47 601.6 2025-09-07T10:32:32.6822312Z #47 601.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:32.6823170Z #47 601.6 2025-09-07T10:32:32.6825698Z #47 601.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:32.6829185Z #47 601.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:32:32.6830046Z #47 601.6 ^ 2025-09-07T10:32:32.6830540Z #47 601.6 2025-09-07T10:32:32.6831298Z #47 601.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:32.6832233Z #47 601.6 2025-09-07T10:32:32.6835172Z #47 601.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:32.6838427Z #47 601.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:32:32.6839370Z #47 601.6 ^ 2025-09-07T10:32:32.6839858Z #47 601.6 2025-09-07T10:32:32.6840632Z #47 601.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:32.6841730Z #47 601.6 2025-09-07T10:32:32.6844168Z #47 601.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:32.6847212Z #47 601.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:32:32.6848108Z #47 601.6 ^ 2025-09-07T10:32:32.6848601Z #47 601.6 2025-09-07T10:32:32.6849689Z #47 601.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:32.6850597Z #47 601.6 2025-09-07T10:32:32.6853305Z #47 601.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:32.6856462Z #47 601.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:32:32.6857365Z #47 601.6 ^ 2025-09-07T10:32:32.6857846Z #47 601.6 2025-09-07T10:32:32.6858591Z #47 601.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:32.6859537Z #47 601.6 2025-09-07T10:32:36.1700214Z #47 605.1 [243/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:32:36.1714821Z #47 605.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:36.1717333Z #47 605.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:32:36.1718033Z #47 605.1 ^ 2025-09-07T10:32:36.1718414Z #47 605.1 2025-09-07T10:32:36.1718976Z #47 605.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:36.1719778Z #47 605.1 2025-09-07T10:32:36.1721720Z #47 605.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:36.1723999Z #47 605.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:32:36.1724671Z #47 605.1 ^ 2025-09-07T10:32:36.1725032Z #47 605.1 2025-09-07T10:32:36.1725610Z #47 605.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:36.1726294Z #47 605.1 2025-09-07T10:32:36.1728224Z #47 605.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:36.1730511Z #47 605.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:32:36.1731325Z #47 605.1 ^ 2025-09-07T10:32:36.1731697Z #47 605.1 2025-09-07T10:32:36.1732263Z #47 605.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:36.1732940Z #47 605.1 2025-09-07T10:32:36.1734891Z #47 605.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:36.1737296Z #47 605.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:32:36.1737974Z #47 605.1 ^ 2025-09-07T10:32:36.1738347Z #47 605.1 2025-09-07T10:32:36.1738898Z #47 605.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:36.1739586Z #47 605.1 2025-09-07T10:32:36.1741493Z #47 605.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:36.1743771Z #47 605.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:32:36.1744449Z #47 605.1 ^ 2025-09-07T10:32:36.1744820Z #47 605.1 2025-09-07T10:32:36.1745390Z #47 605.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:36.1746235Z #47 605.1 2025-09-07T10:32:36.3490092Z #47 605.2 [244/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:32:36.3509911Z #47 605.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:36.3513152Z #47 605.2 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:32:36.3514093Z #47 605.2 ^ 2025-09-07T10:32:36.3514580Z #47 605.2 2025-09-07T10:32:36.3515350Z #47 605.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:36.3516304Z #47 605.2 2025-09-07T10:32:36.3518956Z #47 605.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:36.3522569Z #47 605.2 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:32:36.3523497Z #47 605.2 ^ 2025-09-07T10:32:36.3523976Z #47 605.2 2025-09-07T10:32:36.3524742Z #47 605.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:36.3525710Z #47 605.2 2025-09-07T10:32:36.3528441Z #47 605.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:36.3531815Z #47 605.2 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:32:36.3532733Z #47 605.2 ^ 2025-09-07T10:32:36.3533241Z #47 605.2 2025-09-07T10:32:36.3533982Z #47 605.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:36.3535256Z #47 605.2 2025-09-07T10:32:36.3537965Z #47 605.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:36.3541219Z #47 605.2 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:32:36.3542135Z #47 605.2 ^ 2025-09-07T10:32:36.3542664Z #47 605.2 2025-09-07T10:32:36.3543413Z #47 605.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:36.3544374Z #47 605.2 2025-09-07T10:32:36.3547253Z #47 605.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:36.3550567Z #47 605.2 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:32:36.3551623Z #47 605.2 ^ 2025-09-07T10:32:36.3552074Z #47 605.2 2025-09-07T10:32:36.3552764Z #47 605.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:36.3553678Z #47 605.2 2025-09-07T10:32:37.2491521Z #47 606.2 [245/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:32:37.4504142Z #47 606.4 [246/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:32:39.7801483Z #47 608.7 [247/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:32:47.8951076Z #47 616.9 [248/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T10:32:48.0759417Z #47 616.9 [249/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:32:48.7749328Z #47 617.7 [250/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:32:49.5688322Z #47 618.5 [251/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:32:50.4609416Z #47 619.4 [252/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:32:51.8015454Z #47 620.8 [253/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:32:51.8036527Z #47 620.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:51.8040347Z #47 620.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:32:51.8041446Z #47 620.8 ^ 2025-09-07T10:32:51.8041976Z #47 620.8 2025-09-07T10:32:51.8042797Z #47 620.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:51.8043790Z #47 620.8 2025-09-07T10:32:51.8046634Z #47 620.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:51.8050133Z #47 620.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:32:51.8051095Z #47 620.8 ^ 2025-09-07T10:32:51.8051560Z #47 620.8 2025-09-07T10:32:51.8052251Z #47 620.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:51.8053306Z #47 620.8 2025-09-07T10:32:51.8055783Z #47 620.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:51.8058804Z #47 620.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:32:51.8059735Z #47 620.8 ^ 2025-09-07T10:32:51.8060187Z #47 620.8 2025-09-07T10:32:51.8060903Z #47 620.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:51.8061841Z #47 620.8 2025-09-07T10:32:51.8064684Z #47 620.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:51.8067999Z #47 620.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:32:51.8068912Z #47 620.8 ^ 2025-09-07T10:32:51.8069336Z #47 620.8 2025-09-07T10:32:51.8070028Z #47 620.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:51.8070882Z #47 620.8 2025-09-07T10:32:51.8073412Z #47 620.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:51.8076586Z #47 620.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:32:51.8077622Z #47 620.8 ^ 2025-09-07T10:32:51.8078089Z #47 620.8 2025-09-07T10:32:51.8078835Z #47 620.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:51.8079788Z #47 620.8 2025-09-07T10:32:52.5682015Z #47 621.5 [254/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o 2025-09-07T10:32:52.5700310Z #47 621.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:32:52.5703148Z #47 621.5 bool use_swa = window_left != -1; 2025-09-07T10:32:52.5703803Z #47 621.5 ^ 2025-09-07T10:32:52.5704247Z #47 621.5 2025-09-07T10:32:52.5705038Z #47 621.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:52.5705987Z #47 621.5 2025-09-07T10:32:52.5708449Z #47 621.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:32:52.5711371Z #47 621.5 bool use_swa = window_left != -1; 2025-09-07T10:32:52.5712015Z #47 621.5 ^ 2025-09-07T10:32:52.5712477Z #47 621.5 2025-09-07T10:32:55.4490031Z #47 624.4 [255/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:32:55.8311016Z #47 624.8 [256/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:32:55.8332653Z #47 624.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:55.8336109Z #47 624.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:32:55.8337150Z #47 624.8 ^ 2025-09-07T10:32:55.8337635Z #47 624.8 2025-09-07T10:32:55.8338323Z #47 624.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:55.8339137Z #47 624.8 2025-09-07T10:32:55.8342040Z #47 624.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:55.8345605Z #47 624.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:32:55.8346736Z #47 624.8 ^ 2025-09-07T10:32:55.8347282Z #47 624.8 2025-09-07T10:32:55.8348120Z #47 624.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:55.8349414Z #47 624.8 2025-09-07T10:32:55.8352395Z #47 624.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:55.8355878Z #47 624.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:32:55.8356924Z #47 624.8 ^ 2025-09-07T10:32:55.8357434Z #47 624.8 2025-09-07T10:32:55.8358274Z #47 624.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:55.8359465Z #47 624.8 2025-09-07T10:32:55.8362342Z #47 624.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:55.8365890Z #47 624.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:32:55.8366957Z #47 624.8 ^ 2025-09-07T10:32:55.8367475Z #47 624.8 2025-09-07T10:32:55.8368273Z #47 624.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:55.8369304Z #47 624.8 2025-09-07T10:32:55.8372330Z #47 624.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:32:55.8375906Z #47 624.8 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:32:55.8376962Z #47 624.8 ^ 2025-09-07T10:32:55.8377470Z #47 624.8 2025-09-07T10:32:55.8378294Z #47 624.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:32:55.8379278Z #47 624.8 2025-09-07T10:33:01.4733833Z #47 630.4 [257/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T10:33:01.4755468Z #47 630.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:01.4759036Z #47 630.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:33:01.4760023Z #47 630.4 ^ 2025-09-07T10:33:01.4760551Z #47 630.4 2025-09-07T10:33:01.4761410Z #47 630.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:01.4762696Z #47 630.4 2025-09-07T10:33:01.4765713Z #47 630.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:01.4769240Z #47 630.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:33:01.4770181Z #47 630.4 ^ 2025-09-07T10:33:01.4770711Z #47 630.4 2025-09-07T10:33:01.4771608Z #47 630.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:01.4772560Z #47 630.4 2025-09-07T10:33:01.4775286Z #47 630.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:01.4778595Z #47 630.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:33:01.4779557Z #47 630.4 ^ 2025-09-07T10:33:01.4780083Z #47 630.4 2025-09-07T10:33:01.4780906Z #47 630.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:01.4781929Z #47 630.4 2025-09-07T10:33:01.4784938Z #47 630.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:01.4788463Z #47 630.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:33:01.4789581Z #47 630.4 ^ 2025-09-07T10:33:01.4790132Z #47 630.4 2025-09-07T10:33:01.4790960Z #47 630.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:01.4791997Z #47 630.4 2025-09-07T10:33:01.4795068Z #47 630.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:01.4798463Z #47 630.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:33:01.4799456Z #47 630.4 ^ 2025-09-07T10:33:01.4799956Z #47 630.4 2025-09-07T10:33:01.4800703Z #47 630.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:01.4801665Z #47 630.4 2025-09-07T10:33:03.4364250Z #47 632.4 [258/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T10:33:03.4383893Z #47 632.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:03.4387338Z #47 632.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:33:03.4388227Z #47 632.4 ^ 2025-09-07T10:33:03.4388698Z #47 632.4 2025-09-07T10:33:03.4389407Z #47 632.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:03.4390277Z #47 632.4 2025-09-07T10:33:03.4392880Z #47 632.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:03.4395988Z #47 632.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:33:03.4397155Z #47 632.4 ^ 2025-09-07T10:33:03.4397651Z #47 632.4 2025-09-07T10:33:03.4398378Z #47 632.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:03.4399293Z #47 632.4 2025-09-07T10:33:03.4402180Z #47 632.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:03.4405417Z #47 632.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:33:03.4406368Z #47 632.4 ^ 2025-09-07T10:33:03.4406874Z #47 632.4 2025-09-07T10:33:03.4407672Z #47 632.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:03.4408645Z #47 632.4 2025-09-07T10:33:03.4411597Z #47 632.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:03.4414645Z #47 632.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:33:03.4415524Z #47 632.4 ^ 2025-09-07T10:33:03.4416023Z #47 632.4 2025-09-07T10:33:03.4416772Z #47 632.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:03.4417717Z #47 632.4 2025-09-07T10:33:03.4420291Z #47 632.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:03.4423459Z #47 632.4 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T10:33:03.4424557Z #47 632.4 ^ 2025-09-07T10:33:03.4425028Z #47 632.4 2025-09-07T10:33:03.4425761Z #47 632.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:03.4426622Z #47 632.4 2025-09-07T10:33:05.2065545Z #47 634.2 [259/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T10:33:05.2084317Z #47 634.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:05.2087400Z #47 634.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:33:05.2088236Z #47 634.2 ^ 2025-09-07T10:33:05.2088719Z #47 634.2 2025-09-07T10:33:05.2089426Z #47 634.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:05.2090379Z #47 634.2 2025-09-07T10:33:05.2093193Z #47 634.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:05.2096327Z #47 634.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:33:05.2097228Z #47 634.2 ^ 2025-09-07T10:33:05.2097706Z #47 634.2 2025-09-07T10:33:05.2098474Z #47 634.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:05.2099334Z #47 634.2 2025-09-07T10:33:05.2101905Z #47 634.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:05.2105116Z #47 634.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:33:05.2106304Z #47 634.2 ^ 2025-09-07T10:33:05.2106799Z #47 634.2 2025-09-07T10:33:05.2107538Z #47 634.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:05.2108437Z #47 634.2 2025-09-07T10:33:05.2111041Z #47 634.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:05.2113915Z #47 634.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:33:05.2114765Z #47 634.2 ^ 2025-09-07T10:33:05.2115233Z #47 634.2 2025-09-07T10:33:05.2115992Z #47 634.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:05.2116857Z #47 634.2 2025-09-07T10:33:05.2119289Z #47 634.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:05.2122474Z #47 634.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:33:05.2123334Z #47 634.2 ^ 2025-09-07T10:33:05.2123779Z #47 634.2 2025-09-07T10:33:05.2124396Z #47 634.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:05.2125166Z #47 634.2 2025-09-07T10:33:05.5914781Z #47 634.6 [260/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T10:33:05.5929088Z #47 634.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:05.5931657Z #47 634.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:33:05.5932556Z #47 634.6 ^ 2025-09-07T10:33:05.5932926Z #47 634.6 2025-09-07T10:33:05.5933501Z #47 634.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:05.5934185Z #47 634.6 2025-09-07T10:33:05.5936161Z #47 634.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:05.5938483Z #47 634.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:33:05.5939143Z #47 634.6 ^ 2025-09-07T10:33:05.5939519Z #47 634.6 2025-09-07T10:33:05.5940072Z #47 634.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:05.5940774Z #47 634.6 2025-09-07T10:33:05.5942728Z #47 634.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:05.5945250Z #47 634.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:33:05.5945929Z #47 634.6 ^ 2025-09-07T10:33:05.5946292Z #47 634.6 2025-09-07T10:33:05.5946879Z #47 634.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:05.5947669Z #47 634.6 2025-09-07T10:33:05.5950327Z #47 634.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:05.5952717Z #47 634.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:33:05.5953391Z #47 634.6 ^ 2025-09-07T10:33:05.5953775Z #47 634.6 2025-09-07T10:33:05.5954451Z #47 634.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:05.5955161Z #47 634.6 2025-09-07T10:33:05.5957214Z #47 634.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:05.5959853Z #47 634.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T10:33:05.5960534Z #47 634.6 ^ 2025-09-07T10:33:05.5960895Z #47 634.6 2025-09-07T10:33:05.5961473Z #47 634.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:05.5962180Z #47 634.6 2025-09-07T10:33:06.4905053Z #47 635.5 [261/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T10:33:06.4919749Z #47 635.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:06.4922285Z #47 635.5 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:33:06.4922966Z #47 635.5 ^ 2025-09-07T10:33:06.4923342Z #47 635.5 2025-09-07T10:33:06.4923919Z #47 635.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:06.4924611Z #47 635.5 2025-09-07T10:33:06.4926701Z #47 635.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:06.4929032Z #47 635.5 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:33:06.4929698Z #47 635.5 ^ 2025-09-07T10:33:06.4930072Z #47 635.5 2025-09-07T10:33:06.4930747Z #47 635.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:06.4931601Z #47 635.5 2025-09-07T10:33:06.4933549Z #47 635.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:06.4935849Z #47 635.5 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:33:06.4936520Z #47 635.5 ^ 2025-09-07T10:33:06.4936887Z #47 635.5 2025-09-07T10:33:06.4937467Z #47 635.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:06.4938186Z #47 635.5 2025-09-07T10:33:06.4940138Z #47 635.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:06.4942447Z #47 635.5 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:33:06.4943112Z #47 635.5 ^ 2025-09-07T10:33:06.4943488Z #47 635.5 2025-09-07T10:33:06.4944044Z #47 635.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:06.4944750Z #47 635.5 2025-09-07T10:33:06.4946700Z #47 635.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:06.4949382Z #47 635.5 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:33:06.4950058Z #47 635.5 ^ 2025-09-07T10:33:06.4950413Z #47 635.5 2025-09-07T10:33:06.4950980Z #47 635.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:06.4951670Z #47 635.5 2025-09-07T10:33:08.6021425Z #47 637.6 [262/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T10:33:08.6035783Z #47 637.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:08.6038131Z #47 637.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:33:08.6038797Z #47 637.6 ^ 2025-09-07T10:33:08.6039187Z #47 637.6 2025-09-07T10:33:08.6039760Z #47 637.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:08.6040492Z #47 637.6 2025-09-07T10:33:08.6042433Z #47 637.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:08.6044727Z #47 637.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:33:08.6045400Z #47 637.6 ^ 2025-09-07T10:33:08.6045757Z #47 637.6 2025-09-07T10:33:08.6046317Z #47 637.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:08.6046994Z #47 637.6 2025-09-07T10:33:08.6049221Z #47 637.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:08.6051860Z #47 637.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:33:08.6052545Z #47 637.6 ^ 2025-09-07T10:33:08.6052967Z #47 637.6 2025-09-07T10:33:08.6053617Z #47 637.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:08.6054403Z #47 637.6 2025-09-07T10:33:08.6056348Z #47 637.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:08.6058637Z #47 637.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:33:08.6059327Z #47 637.6 ^ 2025-09-07T10:33:08.6059899Z #47 637.6 2025-09-07T10:33:08.6060475Z #47 637.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:08.6061152Z #47 637.6 2025-09-07T10:33:08.6063097Z #47 637.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:08.6065394Z #47 637.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T10:33:08.6066050Z #47 637.6 ^ 2025-09-07T10:33:08.6066421Z #47 637.6 2025-09-07T10:33:08.6066973Z #47 637.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:08.6067668Z #47 637.6 2025-09-07T10:33:08.7192172Z #47 637.7 [263/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:33:09.1344813Z #47 638.1 [264/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T10:33:09.1360273Z #47 638.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:09.1362759Z #47 638.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:33:09.1363504Z #47 638.1 ^ 2025-09-07T10:33:09.1363870Z #47 638.1 2025-09-07T10:33:09.1364554Z #47 638.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:09.1365268Z #47 638.1 2025-09-07T10:33:09.1367242Z #47 638.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:09.1369717Z #47 638.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:33:09.1370483Z #47 638.1 ^ 2025-09-07T10:33:09.1370884Z #47 638.1 2025-09-07T10:33:09.1371643Z #47 638.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:09.1372377Z #47 638.1 2025-09-07T10:33:09.1374541Z #47 638.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:09.1377040Z #47 638.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:33:09.1377803Z #47 638.1 ^ 2025-09-07T10:33:09.1378194Z #47 638.1 2025-09-07T10:33:09.1378755Z #47 638.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:09.1379475Z #47 638.1 2025-09-07T10:33:09.1381429Z #47 638.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:09.1384004Z #47 638.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:33:09.1384743Z #47 638.1 ^ 2025-09-07T10:33:09.1385106Z #47 638.1 2025-09-07T10:33:09.1385672Z #47 638.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:09.1386346Z #47 638.1 2025-09-07T10:33:09.1388291Z #47 638.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:09.1390637Z #47 638.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:33:09.1391523Z #47 638.1 ^ 2025-09-07T10:33:09.1391896Z #47 638.1 2025-09-07T10:33:09.1392446Z #47 638.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:09.1393151Z #47 638.1 2025-09-07T10:33:09.5492245Z #47 638.5 [265/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:33:13.5667234Z #47 642.5 [266/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:33:14.9365399Z #47 643.9 [267/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T10:33:14.9384988Z #47 643.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:14.9388291Z #47 643.9 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:33:14.9389275Z #47 643.9 ^ 2025-09-07T10:33:14.9389734Z #47 643.9 2025-09-07T10:33:14.9390487Z #47 643.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:14.9391405Z #47 643.9 2025-09-07T10:33:14.9394102Z #47 643.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:14.9397497Z #47 643.9 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:33:14.9398473Z #47 643.9 ^ 2025-09-07T10:33:14.9398942Z #47 643.9 2025-09-07T10:33:14.9399675Z #47 643.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:14.9400578Z #47 643.9 2025-09-07T10:33:14.9403214Z #47 643.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:14.9406354Z #47 643.9 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:33:14.9407332Z #47 643.9 ^ 2025-09-07T10:33:14.9407824Z #47 643.9 2025-09-07T10:33:14.9408697Z #47 643.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:14.9409613Z #47 643.9 2025-09-07T10:33:14.9412542Z #47 643.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:14.9415915Z #47 643.9 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:33:14.9416834Z #47 643.9 ^ 2025-09-07T10:33:14.9417299Z #47 643.9 2025-09-07T10:33:14.9418061Z #47 643.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:14.9418980Z #47 643.9 2025-09-07T10:33:14.9421716Z #47 643.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T10:33:14.9425210Z #47 643.9 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T10:33:14.9426357Z #47 643.9 ^ 2025-09-07T10:33:14.9426893Z #47 643.9 2025-09-07T10:33:14.9427755Z #47 643.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:14.9428699Z #47 643.9 2025-09-07T10:33:30.8497941Z #47 659.8 [268/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o 2025-09-07T10:33:30.8514924Z #47 659.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:33:30.8517467Z #47 659.8 bool use_swa = window_left != -1; 2025-09-07T10:33:30.8518012Z #47 659.8 ^ 2025-09-07T10:33:30.8518405Z #47 659.8 2025-09-07T10:33:30.8519066Z #47 659.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:30.8519936Z #47 659.8 2025-09-07T10:33:30.8522167Z #47 659.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:33:30.8524896Z #47 659.8 bool use_swa = window_left != -1; 2025-09-07T10:33:30.8525452Z #47 659.8 ^ 2025-09-07T10:33:30.8525819Z #47 659.8 2025-09-07T10:33:32.1513503Z #47 661.1 [269/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T10:33:35.1950522Z #47 664.2 [270/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o 2025-09-07T10:33:35.1969544Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.1991957Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2012739Z #47 664.2 ^ 2025-09-07T10:33:35.2013267Z #47 664.2 2025-09-07T10:33:35.2014205Z #47 664.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:35.2015128Z #47 664.2 2025-09-07T10:33:35.2017488Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2039909Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2061671Z #47 664.2 ^ 2025-09-07T10:33:35.2062183Z #47 664.2 2025-09-07T10:33:35.2064654Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2087272Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2108830Z #47 664.2 ^ 2025-09-07T10:33:35.2109341Z #47 664.2 2025-09-07T10:33:35.2112099Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2134957Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2160809Z #47 664.2 ^ 2025-09-07T10:33:35.2161221Z #47 664.2 2025-09-07T10:33:35.2163207Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2180587Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2198116Z #47 664.2 ^ 2025-09-07T10:33:35.2198518Z #47 664.2 2025-09-07T10:33:35.2200301Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2217460Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2233214Z #47 664.2 ^ 2025-09-07T10:33:35.2233598Z #47 664.2 2025-09-07T10:33:35.2235376Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2253674Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2274583Z #47 664.2 ^ 2025-09-07T10:33:35.2275093Z #47 664.2 2025-09-07T10:33:35.2277483Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2300978Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2322546Z #47 664.2 ^ 2025-09-07T10:33:35.2323050Z #47 664.2 2025-09-07T10:33:35.2325379Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2349345Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2369880Z #47 664.2 ^ 2025-09-07T10:33:35.2370408Z #47 664.2 2025-09-07T10:33:35.2371343Z #47 664.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:35.2372260Z #47 664.2 2025-09-07T10:33:35.2374646Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2397129Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2418386Z #47 664.2 ^ 2025-09-07T10:33:35.2418911Z #47 664.2 2025-09-07T10:33:35.2421327Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2444040Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2461000Z #47 664.2 ^ 2025-09-07T10:33:35.2461388Z #47 664.2 2025-09-07T10:33:35.2463151Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2479490Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2496946Z #47 664.2 ^ 2025-09-07T10:33:35.2497428Z #47 664.2 2025-09-07T10:33:35.2499845Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2523293Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2543371Z #47 664.2 ^ 2025-09-07T10:33:35.2543834Z #47 664.2 2025-09-07T10:33:35.2546099Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2568714Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2586018Z #47 664.2 ^ 2025-09-07T10:33:35.2586418Z #47 664.2 2025-09-07T10:33:35.2588212Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2605458Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2621495Z #47 664.2 ^ 2025-09-07T10:33:35.2621877Z #47 664.2 2025-09-07T10:33:35.2623647Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2640527Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2659073Z #47 664.2 ^ 2025-09-07T10:33:35.2659588Z #47 664.2 2025-09-07T10:33:35.2662232Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2684770Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2705832Z #47 664.2 ^ 2025-09-07T10:33:35.2706376Z #47 664.2 2025-09-07T10:33:35.2707134Z #47 664.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:35.2708068Z #47 664.2 2025-09-07T10:33:35.2710423Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2734803Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2755804Z #47 664.2 ^ 2025-09-07T10:33:35.2756327Z #47 664.2 2025-09-07T10:33:35.2758662Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2781165Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2801703Z #47 664.2 ^ 2025-09-07T10:33:35.2802228Z #47 664.2 2025-09-07T10:33:35.2804636Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2826884Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2847948Z #47 664.2 ^ 2025-09-07T10:33:35.2848479Z #47 664.2 2025-09-07T10:33:35.2850817Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2868073Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2884477Z #47 664.2 ^ 2025-09-07T10:33:35.2884901Z #47 664.2 2025-09-07T10:33:35.2886664Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2904317Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2920075Z #47 664.2 ^ 2025-09-07T10:33:35.2920637Z #47 664.2 2025-09-07T10:33:35.2922389Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2939495Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2955379Z #47 664.2 ^ 2025-09-07T10:33:35.2955782Z #47 664.2 2025-09-07T10:33:35.2957541Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.2974737Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.2990481Z #47 664.2 ^ 2025-09-07T10:33:35.2990897Z #47 664.2 2025-09-07T10:33:35.2992841Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3009078Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3024637Z #47 664.2 ^ 2025-09-07T10:33:35.3025039Z #47 664.2 2025-09-07T10:33:35.3025595Z #47 664.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:35.3026320Z #47 664.2 2025-09-07T10:33:35.3028078Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3044645Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3063381Z #47 664.2 ^ 2025-09-07T10:33:35.3063928Z #47 664.2 2025-09-07T10:33:35.3066320Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3088814Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3110539Z #47 664.2 ^ 2025-09-07T10:33:35.3111097Z #47 664.2 2025-09-07T10:33:35.3113490Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3135068Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3156202Z #47 664.2 ^ 2025-09-07T10:33:35.3156984Z #47 664.2 2025-09-07T10:33:35.3159472Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3182881Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3204705Z #47 664.2 ^ 2025-09-07T10:33:35.3205203Z #47 664.2 2025-09-07T10:33:35.3207516Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3230732Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3251689Z #47 664.2 ^ 2025-09-07T10:33:35.3252089Z #47 664.2 2025-09-07T10:33:35.3253838Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3271057Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3286713Z #47 664.2 ^ 2025-09-07T10:33:35.3287116Z #47 664.2 2025-09-07T10:33:35.3288867Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3306072Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3321843Z #47 664.2 ^ 2025-09-07T10:33:35.3322243Z #47 664.2 2025-09-07T10:33:35.3323998Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3340430Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3358692Z #47 664.2 ^ 2025-09-07T10:33:35.3359182Z #47 664.2 2025-09-07T10:33:35.3359908Z #47 664.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:35.3360753Z #47 664.2 2025-09-07T10:33:35.3363263Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3384890Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3405668Z #47 664.2 ^ 2025-09-07T10:33:35.3406178Z #47 664.2 2025-09-07T10:33:35.3408501Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3432307Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3450812Z #47 664.2 ^ 2025-09-07T10:33:35.3451414Z #47 664.2 2025-09-07T10:33:35.3453663Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3477059Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3497642Z #47 664.2 ^ 2025-09-07T10:33:35.3498164Z #47 664.2 2025-09-07T10:33:35.3500623Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3524167Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3546689Z #47 664.2 ^ 2025-09-07T10:33:35.3547220Z #47 664.2 2025-09-07T10:33:35.3550176Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3573084Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3594324Z #47 664.2 ^ 2025-09-07T10:33:35.3594817Z #47 664.2 2025-09-07T10:33:35.3597186Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3621150Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3643321Z #47 664.2 ^ 2025-09-07T10:33:35.3643823Z #47 664.2 2025-09-07T10:33:35.3646233Z #47 664.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:35.3767296Z #47 664.2 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:35.3783463Z #47 664.2 ^ 2025-09-07T10:33:35.3783857Z #47 664.2 2025-09-07T10:33:35.6005928Z #47 664.6 [271/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:33:38.5020365Z #47 667.5 [272/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:33:41.2527986Z #47 670.2 [273/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o 2025-09-07T10:33:42.4084454Z #47 671.4 [274/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:33:42.7765450Z #47 671.7 [275/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o 2025-09-07T10:33:42.7783153Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.7805236Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.7824535Z #47 671.7 ^ 2025-09-07T10:33:42.7824970Z #47 671.7 2025-09-07T10:33:42.7825599Z #47 671.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:42.7826516Z #47 671.7 2025-09-07T10:33:42.7828651Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.7849830Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.7868889Z #47 671.7 ^ 2025-09-07T10:33:42.7869340Z #47 671.7 2025-09-07T10:33:42.7871516Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.7891680Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.7910683Z #47 671.7 ^ 2025-09-07T10:33:42.7911124Z #47 671.7 2025-09-07T10:33:42.7913445Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.7934214Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.7954297Z #47 671.7 ^ 2025-09-07T10:33:42.7954746Z #47 671.7 2025-09-07T10:33:42.7957033Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.7978369Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.7998125Z #47 671.7 ^ 2025-09-07T10:33:42.7998614Z #47 671.7 2025-09-07T10:33:42.8001076Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8023494Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8043894Z #47 671.7 ^ 2025-09-07T10:33:42.8044366Z #47 671.7 2025-09-07T10:33:42.8046619Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8068782Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8089047Z #47 671.7 ^ 2025-09-07T10:33:42.8089521Z #47 671.7 2025-09-07T10:33:42.8091951Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8114075Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8135521Z #47 671.7 ^ 2025-09-07T10:33:42.8136030Z #47 671.7 2025-09-07T10:33:42.8138400Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8159427Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8178572Z #47 671.7 ^ 2025-09-07T10:33:42.8179084Z #47 671.7 2025-09-07T10:33:42.8179815Z #47 671.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:42.8180581Z #47 671.7 2025-09-07T10:33:42.8182634Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8203474Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8222399Z #47 671.7 ^ 2025-09-07T10:33:42.8222885Z #47 671.7 2025-09-07T10:33:42.8225000Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8244664Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8307298Z #47 671.7 ^ 2025-09-07T10:33:42.8307746Z #47 671.7 2025-09-07T10:33:42.8310016Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8330506Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8349747Z #47 671.7 ^ 2025-09-07T10:33:42.8350174Z #47 671.7 2025-09-07T10:33:42.8352335Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8374241Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8393603Z #47 671.7 ^ 2025-09-07T10:33:42.8394105Z #47 671.7 2025-09-07T10:33:42.8396511Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8418658Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8437957Z #47 671.7 ^ 2025-09-07T10:33:42.8438439Z #47 671.7 2025-09-07T10:33:42.8440614Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8461729Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8481085Z #47 671.7 ^ 2025-09-07T10:33:42.8481587Z #47 671.7 2025-09-07T10:33:42.8483739Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8504797Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8524317Z #47 671.7 ^ 2025-09-07T10:33:42.8524826Z #47 671.7 2025-09-07T10:33:42.8527027Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8548496Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8567584Z #47 671.7 ^ 2025-09-07T10:33:42.8568196Z #47 671.7 2025-09-07T10:33:42.8568830Z #47 671.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:42.8569715Z #47 671.7 2025-09-07T10:33:42.8571992Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8591863Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8610803Z #47 671.7 ^ 2025-09-07T10:33:42.8611440Z #47 671.7 2025-09-07T10:33:42.8613600Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8634190Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8653565Z #47 671.7 ^ 2025-09-07T10:33:42.8654001Z #47 671.7 2025-09-07T10:33:42.8656352Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8676638Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8695919Z #47 671.7 ^ 2025-09-07T10:33:42.8696340Z #47 671.7 2025-09-07T10:33:42.8698495Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8720082Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8739667Z #47 671.7 ^ 2025-09-07T10:33:42.8740164Z #47 671.7 2025-09-07T10:33:42.8742590Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8764106Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8783881Z #47 671.7 ^ 2025-09-07T10:33:42.8784323Z #47 671.7 2025-09-07T10:33:42.8786639Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8807508Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8826676Z #47 671.7 ^ 2025-09-07T10:33:42.8827192Z #47 671.7 2025-09-07T10:33:42.8829529Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8850426Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8870568Z #47 671.7 ^ 2025-09-07T10:33:42.8871079Z #47 671.7 2025-09-07T10:33:42.8873316Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8894503Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8913842Z #47 671.7 ^ 2025-09-07T10:33:42.8914308Z #47 671.7 2025-09-07T10:33:42.8915069Z #47 671.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:42.8915937Z #47 671.7 2025-09-07T10:33:42.8918127Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8939121Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8957946Z #47 671.7 ^ 2025-09-07T10:33:42.8958356Z #47 671.7 2025-09-07T10:33:42.8960558Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.8980857Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.8999180Z #47 671.7 ^ 2025-09-07T10:33:42.8999679Z #47 671.7 2025-09-07T10:33:42.9002122Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9022585Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9042030Z #47 671.7 ^ 2025-09-07T10:33:42.9042532Z #47 671.7 2025-09-07T10:33:42.9044765Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9067035Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9086538Z #47 671.7 ^ 2025-09-07T10:33:42.9087003Z #47 671.7 2025-09-07T10:33:42.9089279Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9111011Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9130651Z #47 671.7 ^ 2025-09-07T10:33:42.9131236Z #47 671.7 2025-09-07T10:33:42.9133445Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9155417Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9190059Z #47 671.7 ^ 2025-09-07T10:33:42.9190528Z #47 671.7 2025-09-07T10:33:42.9192955Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9213983Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9233552Z #47 671.7 ^ 2025-09-07T10:33:42.9234027Z #47 671.7 2025-09-07T10:33:42.9236500Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9257452Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9275865Z #47 671.7 ^ 2025-09-07T10:33:42.9276349Z #47 671.7 2025-09-07T10:33:42.9277100Z #47 671.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:33:42.9277961Z #47 671.7 2025-09-07T10:33:42.9280253Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9300650Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9319076Z #47 671.7 ^ 2025-09-07T10:33:42.9319495Z #47 671.7 2025-09-07T10:33:42.9321683Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9342447Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9361413Z #47 671.7 ^ 2025-09-07T10:33:42.9361902Z #47 671.7 2025-09-07T10:33:42.9364048Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9384567Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9403842Z #47 671.7 ^ 2025-09-07T10:33:42.9404457Z #47 671.7 2025-09-07T10:33:42.9406550Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9427610Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9447107Z #47 671.7 ^ 2025-09-07T10:33:42.9447632Z #47 671.7 2025-09-07T10:33:42.9449981Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9471293Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9490748Z #47 671.7 ^ 2025-09-07T10:33:42.9491381Z #47 671.7 2025-09-07T10:33:42.9493634Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9514449Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9533557Z #47 671.7 ^ 2025-09-07T10:33:42.9534021Z #47 671.7 2025-09-07T10:33:42.9536514Z #47 671.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T10:33:42.9557533Z #47 671.7 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T10:33:42.9577106Z #47 671.7 ^ 2025-09-07T10:33:42.9577577Z #47 671.7 2025-09-07T10:33:43.2903771Z #47 672.3 [276/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:33:44.5074614Z #47 673.5 [277/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:33:46.0627391Z #47 675.0 [278/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:33:54.2354617Z #47 683.2 [279/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:34:06.4018650Z #47 695.4 [280/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:34:11.3326378Z #47 700.3 [281/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:34:11.7668526Z #47 700.7 [282/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:34:17.2464363Z #47 706.2 [283/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:34:19.4856939Z #47 708.5 [284/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T10:34:20.2093975Z #47 709.2 [285/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:34:21.5183845Z #47 710.5 [286/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:34:22.7925344Z #47 711.8 [287/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o 2025-09-07T10:34:22.8093408Z #47 711.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:34:22.8096043Z #47 711.8 bool use_swa = window_left != -1; 2025-09-07T10:34:22.8096861Z #47 711.8 ^ 2025-09-07T10:34:22.8097288Z #47 711.8 2025-09-07T10:34:22.8097986Z #47 711.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:34:22.8098885Z #47 711.8 2025-09-07T10:34:22.8101209Z #47 711.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:34:22.8103973Z #47 711.8 bool use_swa = window_left != -1; 2025-09-07T10:34:22.8104682Z #47 711.8 ^ 2025-09-07T10:34:22.8105131Z #47 711.8 2025-09-07T10:34:23.0704990Z #47 712.0 [288/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:34:27.8224301Z #47 716.8 [289/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output fmha_cutlass_sm100a/fmha_cutlass_sm100_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=fmha_cutlass_sm100a -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_100a,code=sm_100a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/csrc/fmha_cutlass_sm100_pybind.cu -o fmha_cutlass_sm100a/fmha_cutlass_sm100_pybind.cuda.o 2025-09-07T10:34:28.2911786Z #47 717.3 [290/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:34:28.4790884Z #47 717.4 [291/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o 2025-09-07T10:34:28.4804926Z #47 717.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:34:28.4807175Z #47 717.4 bool use_swa = window_left != -1; 2025-09-07T10:34:28.4807682Z #47 717.4 ^ 2025-09-07T10:34:28.4808011Z #47 717.4 2025-09-07T10:34:28.4808612Z #47 717.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T10:34:28.4809315Z #47 717.4 2025-09-07T10:34:28.4811519Z #47 717.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T10:34:28.4813513Z #47 717.4 bool use_swa = window_left != -1; 2025-09-07T10:34:28.4813988Z #47 717.4 ^ 2025-09-07T10:34:28.4814334Z #47 717.4 2025-09-07T10:34:28.6763083Z #47 717.6 [292/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:34:29.3846589Z #47 718.4 [293/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:34:30.5288827Z #47 719.5 [294/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:34:33.0676034Z #47 722.0 [295/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T10:34:33.6344117Z #47 722.6 [296/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:34:35.0830034Z #47 724.1 [297/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output fmha_cutlass_sm100a/blackwell_fmha_plan.cuda.o.d -DTORCH_EXTENSION_NAME=fmha_cutlass_sm100a -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_100a,code=sm_100a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/csrc/blackwell_fmha_plan.cu -o fmha_cutlass_sm100a/blackwell_fmha_plan.cuda.o 2025-09-07T10:34:35.2716015Z #47 724.2 [298/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:34:35.9300725Z #47 724.9 [299/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o 2025-09-07T10:34:35.9318570Z #47 724.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:35.9322433Z #47 724.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:35.9325430Z #47 724.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:35.9326838Z #47 724.9 | 2025-09-07T10:34:35.9329362Z #47 724.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:35.9333173Z #47 724.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:35.9336036Z #47 724.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:35.9337425Z #47 724.9 | 2025-09-07T10:34:35.9339580Z #47 724.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:35.9343506Z #47 724.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:35.9346385Z #47 724.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:35.9347734Z #47 724.9 | 2025-09-07T10:34:35.9350274Z #47 724.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:35.9353936Z #47 724.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:35.9356992Z #47 724.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:35.9358369Z #47 724.9 | 2025-09-07T10:34:35.9360444Z #47 724.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:35.9364142Z #47 724.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:35.9366997Z #47 724.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:35.9368333Z #47 724.9 | 2025-09-07T10:34:35.9370425Z #47 724.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:35.9374193Z #47 724.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:35.9377192Z #47 724.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:35.9378769Z #47 724.9 | 2025-09-07T10:34:36.2369868Z #47 725.2 [300/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o 2025-09-07T10:34:36.2387569Z #47 725.2 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu:16: 2025-09-07T10:34:36.2391758Z #47 725.2 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:36.2394778Z #47 725.2 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:36.2396108Z #47 725.2 | 2025-09-07T10:34:36.2398360Z #47 725.2 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu:16: 2025-09-07T10:34:36.2402194Z #47 725.2 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:36.2405232Z #47 725.2 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:36.2406620Z #47 725.2 | 2025-09-07T10:34:39.6596798Z #47 728.6 [301/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:34:41.0732869Z #47 730.0 [302/412] c++ -MMD -MF trtllm_utils/logger.o.d -DTORCH_EXTENSION_NAME=trtllm_utils -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/csrc/nv_internal -I/workspace/flashinfer/csrc/nv_internal/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include -fPIC -O3 -std=c++17 -Wno-switch-bool -c /workspace/flashinfer/csrc/nv_internal/cpp/common/logger.cpp -o trtllm_utils/logger.o 2025-09-07T10:34:41.3022603Z #47 730.3 [303/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o 2025-09-07T10:34:41.3038411Z #47 730.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu:19: 2025-09-07T10:34:41.3041660Z #47 730.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:41.3044485Z #47 730.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:41.3045814Z #47 730.3 | 2025-09-07T10:34:41.3048246Z #47 730.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu:19: 2025-09-07T10:34:41.3052225Z #47 730.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:41.3054611Z #47 730.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:41.3055596Z #47 730.3 | 2025-09-07T10:34:41.5556556Z #47 730.5 [304/412] c++ -MMD -MF trtllm_utils/stringUtils.o.d -DTORCH_EXTENSION_NAME=trtllm_utils -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/csrc/nv_internal -I/workspace/flashinfer/csrc/nv_internal/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include -fPIC -O3 -std=c++17 -Wno-switch-bool -c /workspace/flashinfer/csrc/nv_internal/cpp/common/stringUtils.cpp -o trtllm_utils/stringUtils.o 2025-09-07T10:34:42.1408942Z #47 731.1 [305/412] c++ logging/logging.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o logging/logging.so 2025-09-07T10:34:42.3239009Z #47 731.1 [306/412] c++ -MMD -MF trtllm_utils/envUtils.o.d -DTORCH_EXTENSION_NAME=trtllm_utils -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/csrc/nv_internal -I/workspace/flashinfer/csrc/nv_internal/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include -fPIC -O3 -std=c++17 -Wno-switch-bool -c /workspace/flashinfer/csrc/nv_internal/cpp/common/envUtils.cpp -o trtllm_utils/envUtils.o 2025-09-07T10:34:42.4126055Z #47 731.4 [307/412] c++ single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:42.9642467Z #47 731.9 [308/412] c++ -MMD -MF trtllm_utils/tllmException.o.d -DTORCH_EXTENSION_NAME=trtllm_utils -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/csrc/nv_internal -I/workspace/flashinfer/csrc/nv_internal/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include -fPIC -O3 -std=c++17 -Wno-switch-bool -c /workspace/flashinfer/csrc/nv_internal/cpp/common/tllmException.cpp -o trtllm_utils/tllmException.o 2025-09-07T10:34:43.1548659Z #47 732.1 [309/412] c++ single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:43.3439554Z #47 732.2 [310/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:43.3467766Z #47 732.3 [311/412] c++ batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:43.5201597Z #47 732.3 [312/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o 2025-09-07T10:34:43.5220461Z #47 732.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:34:43.5224662Z #47 732.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:43.5227753Z #47 732.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:43.5229193Z #47 732.3 | 2025-09-07T10:34:43.5231677Z #47 732.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:34:43.5235555Z #47 732.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:43.5238412Z #47 732.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:43.5239705Z #47 732.3 | 2025-09-07T10:34:43.5241670Z #47 732.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:34:43.5245234Z #47 732.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:43.5248043Z #47 732.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:43.5249655Z #47 732.3 | 2025-09-07T10:34:43.5252004Z #47 732.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:34:43.5255730Z #47 732.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:43.5258737Z #47 732.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:43.5260148Z #47 732.3 | 2025-09-07T10:34:43.5262446Z #47 732.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:34:43.5266383Z #47 732.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:43.5269549Z #47 732.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:43.5271021Z #47 732.3 | 2025-09-07T10:34:43.5273348Z #47 732.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:34:43.5277459Z #47 732.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:43.5280568Z #47 732.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:43.5282144Z #47 732.3 | 2025-09-07T10:34:43.9906615Z #47 733.0 [313/412] c++ single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:44.1687284Z #47 733.0 [314/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:44.2155731Z #47 733.2 [315/412] c++ single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:44.4371846Z #47 733.2 [316/412] c++ single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:44.4391503Z #47 733.3 [317/412] c++ batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:44.8963284Z #47 733.9 [318/412] c++ batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:45.0312386Z #47 734.0 [319/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:45.2257363Z #47 734.2 [320/412] c++ single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:45.4397953Z #47 734.3 [321/412] c++ single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:45.4419634Z #47 734.4 [322/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o 2025-09-07T10:34:45.5901622Z #47 734.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:45.5905285Z #47 734.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:45.5908174Z #47 734.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:45.5909480Z #47 734.4 | 2025-09-07T10:34:45.5911363Z #47 734.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:45.5914804Z #47 734.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:45.5917599Z #47 734.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:45.5918907Z #47 734.4 | 2025-09-07T10:34:45.5920951Z #47 734.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:45.5924575Z #47 734.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:45.5927649Z #47 734.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:45.5928897Z #47 734.4 | 2025-09-07T10:34:45.5930878Z #47 734.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:45.5934617Z #47 734.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:45.5937782Z #47 734.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:45.5939063Z #47 734.4 | 2025-09-07T10:34:45.5941121Z #47 734.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:45.5944628Z #47 734.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:45.5947751Z #47 734.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:45.5949333Z #47 734.4 | 2025-09-07T10:34:45.5951616Z #47 734.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T10:34:45.5955286Z #47 734.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:34:45.5958008Z #47 734.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:34:45.5959334Z #47 734.4 | 2025-09-07T10:34:45.6796232Z #47 734.6 [323/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:45.9521520Z #47 734.9 [324/412] c++ single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:46.0957769Z #47 735.1 [325/412] c++ batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:46.2532233Z #47 735.1 [326/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:46.2553492Z #47 735.1 [327/412] c++ single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:46.2568267Z #47 735.2 [328/412] c++ single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:46.5061843Z #47 735.5 [329/412] c++ batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:46.7540266Z #47 735.7 [330/412] c++ single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:46.9205573Z #47 735.9 [331/412] c++ single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:47.2226950Z #47 736.2 [332/412] c++ batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:47.3904034Z #47 736.4 [333/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:34:47.6213441Z #47 736.4 [334/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:47.6241488Z #47 736.4 [335/412] c++ single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:47.6265458Z #47 736.4 [336/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:34:47.7744406Z #47 736.7 [337/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:47.8821430Z #47 736.8 [338/412] c++ single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:47.8843619Z #47 736.9 [339/412] c++ batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:48.3334242Z #47 737.3 [340/412] c++ single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:48.4653801Z #47 737.3 [341/412] c++ batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:48.4666230Z #47 737.4 [342/412] c++ single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T10:34:48.4687463Z #47 737.4 [343/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T10:34:48.6852437Z #47 737.7 [344/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T10:34:48.8346891Z #47 737.8 [345/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T10:34:49.0470316Z #47 737.9 [346/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T10:34:49.0501541Z #47 738.0 [347/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T10:34:49.3189312Z #47 738.3 [348/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so 2025-09-07T10:34:49.5011520Z #47 738.4 [349/412] c++ batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T10:34:49.5051138Z #47 738.5 [350/412] c++ batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T10:34:49.6344986Z #47 738.6 [351/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so 2025-09-07T10:34:49.7946245Z #47 738.6 [352/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so 2025-09-07T10:34:49.7972872Z #47 738.8 [353/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so 2025-09-07T10:34:50.0665357Z #47 739.0 [354/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so 2025-09-07T10:35:02.3236202Z #47 751.3 [355/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o 2025-09-07T10:35:02.4735786Z #47 751.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu:21: 2025-09-07T10:35:02.4740009Z #47 751.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:02.4743210Z #47 751.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:02.4744717Z #47 751.3 | 2025-09-07T10:35:02.4747237Z #47 751.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu:21: 2025-09-07T10:35:02.4752465Z #47 751.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:02.4755401Z #47 751.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:02.4756612Z #47 751.3 | 2025-09-07T10:35:03.2490369Z #47 752.2 [356/412] c++ batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so 2025-09-07T10:35:05.5445702Z #47 754.5 [357/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:35:06.0702625Z #47 755.0 [358/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o 2025-09-07T10:35:06.0719504Z #47 755.0 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:35:06.0723121Z #47 755.0 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:06.0725955Z #47 755.0 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:06.0727259Z #47 755.0 | 2025-09-07T10:35:06.0729383Z #47 755.0 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:35:06.0733474Z #47 755.0 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:06.0736301Z #47 755.0 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:06.0737578Z #47 755.0 | 2025-09-07T10:35:06.0739651Z #47 755.0 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:35:06.0743211Z #47 755.0 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:06.0746254Z #47 755.0 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:06.0747539Z #47 755.0 | 2025-09-07T10:35:06.0749978Z #47 755.0 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:35:06.0753712Z #47 755.0 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:06.0756528Z #47 755.0 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:06.0757936Z #47 755.0 | 2025-09-07T10:35:06.0760026Z #47 755.0 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:35:06.0763561Z #47 755.0 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:06.0766364Z #47 755.0 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:06.0767679Z #47 755.0 | 2025-09-07T10:35:06.0769802Z #47 755.0 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T10:35:06.0773537Z #47 755.0 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:06.0776292Z #47 755.0 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:06.0777607Z #47 755.0 | 2025-09-07T10:35:08.6973587Z #47 757.7 [359/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o 2025-09-07T10:35:08.8473919Z #47 757.7 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu:16: 2025-09-07T10:35:08.8477710Z #47 757.7 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:08.8480915Z #47 757.7 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:08.8482231Z #47 757.7 | 2025-09-07T10:35:08.8484580Z #47 757.7 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu:16: 2025-09-07T10:35:08.8488660Z #47 757.7 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:08.8491527Z #47 757.7 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:08.8492851Z #47 757.7 | 2025-09-07T10:35:15.1712079Z #47 764.1 [360/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o 2025-09-07T10:35:15.1725354Z #47 764.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:15.1727966Z #47 764.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:15.1730058Z #47 764.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:15.1731200Z #47 764.1 | 2025-09-07T10:35:15.1732947Z #47 764.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:15.1735691Z #47 764.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:15.1737751Z #47 764.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:15.1738707Z #47 764.1 | 2025-09-07T10:35:15.1740275Z #47 764.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:15.1743024Z #47 764.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:15.1745218Z #47 764.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:15.1746245Z #47 764.1 | 2025-09-07T10:35:15.1747866Z #47 764.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:15.1750917Z #47 764.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:15.1753323Z #47 764.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:15.1754361Z #47 764.1 | 2025-09-07T10:35:15.1755944Z #47 764.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:15.1758682Z #47 764.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:15.1760955Z #47 764.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:15.1762371Z #47 764.1 | 2025-09-07T10:35:15.1763970Z #47 764.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:15.1766597Z #47 764.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:15.1768651Z #47 764.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:15.1769780Z #47 764.1 | 2025-09-07T10:35:16.0320990Z #47 765.0 [361/412] c++ batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so 2025-09-07T10:35:17.4300823Z #47 766.4 [362/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o 2025-09-07T10:35:17.4317822Z #47 766.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu:19: 2025-09-07T10:35:17.4321644Z #47 766.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:17.4324944Z #47 766.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:17.4326289Z #47 766.4 | 2025-09-07T10:35:17.4328531Z #47 766.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu:19: 2025-09-07T10:35:17.4332662Z #47 766.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:17.4335888Z #47 766.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:17.4337272Z #47 766.4 | 2025-09-07T10:35:23.1118870Z #47 772.1 [363/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o 2025-09-07T10:35:23.1135092Z #47 772.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:23.1139141Z #47 772.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:23.1141837Z #47 772.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:23.1143088Z #47 772.1 | 2025-09-07T10:35:23.1145229Z #47 772.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:23.1148439Z #47 772.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:23.1151295Z #47 772.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:23.1152468Z #47 772.1 | 2025-09-07T10:35:23.1154438Z #47 772.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:23.1158077Z #47 772.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:23.1160729Z #47 772.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:23.1162234Z #47 772.1 | 2025-09-07T10:35:23.1164324Z #47 772.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:23.1167992Z #47 772.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:23.1170812Z #47 772.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:23.1172260Z #47 772.1 | 2025-09-07T10:35:23.1174380Z #47 772.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:23.1177849Z #47 772.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:23.1180595Z #47 772.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:23.1181898Z #47 772.1 | 2025-09-07T10:35:23.1184298Z #47 772.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T10:35:23.1187977Z #47 772.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:23.1190257Z #47 772.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:23.1191388Z #47 772.1 | 2025-09-07T10:35:23.7130221Z #47 772.7 [364/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output mla/flashinfer_mla_ops.cuda.o.d -DTORCH_EXTENSION_NAME=mla -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_100a,code=sm_100a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/csrc/flashinfer_mla_ops.cu -o mla/flashinfer_mla_ops.cuda.o 2025-09-07T10:35:24.0910819Z #47 773.1 [365/412] c++ batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so 2025-09-07T10:35:28.4479571Z #47 777.4 [366/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o 2025-09-07T10:35:28.4497357Z #47 777.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu:21: 2025-09-07T10:35:28.4501100Z #47 777.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:28.4504418Z #47 777.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:28.4505822Z #47 777.4 | 2025-09-07T10:35:28.4508105Z #47 777.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu:21: 2025-09-07T10:35:28.4512237Z #47 777.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T10:35:28.4515140Z #47 777.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T10:35:28.4516727Z #47 777.4 | 2025-09-07T10:35:29.0726971Z #47 778.0 [367/412] c++ batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so 2025-09-07T10:35:29.3254507Z #47 778.3 [368/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output norm/flashinfer_norm_ops.cuda.o.d -DTORCH_EXTENSION_NAME=norm -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_norm_ops.cu -o norm/flashinfer_norm_ops.cuda.o 2025-09-07T10:35:30.0149032Z #47 779.0 [369/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cascade/flashinfer_cascade_ops.cuda.o.d -DTORCH_EXTENSION_NAME=cascade -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_cascade_ops.cu -o cascade/flashinfer_cascade_ops.cuda.o 2025-09-07T10:35:31.4034080Z #47 780.4 [370/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:35:31.5230054Z #47 780.5 [371/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output quantization/flashinfer_quantization_ops.cuda.o.d -DTORCH_EXTENSION_NAME=quantization -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_quantization_ops.cu -o quantization/flashinfer_quantization_ops.cuda.o 2025-09-07T10:35:31.7158538Z #47 780.5 [372/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output page/flashinfer_page_ops.cuda.o.d -DTORCH_EXTENSION_NAME=page -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_page_ops.cu -o page/flashinfer_page_ops.cuda.o 2025-09-07T10:35:31.9159515Z #47 780.9 [373/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so 2025-09-07T10:35:32.2652634Z #47 781.2 [374/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cascade/cascade.cuda.o.d -DTORCH_EXTENSION_NAME=cascade -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/cascade.cu -o cascade/cascade.cuda.o 2025-09-07T10:35:32.4952320Z #47 781.5 [375/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output page/page.cuda.o.d -DTORCH_EXTENSION_NAME=page -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/page.cu -o page/page.cuda.o 2025-09-07T10:35:32.7185494Z #47 781.7 [376/412] c++ cascade/cascade.cuda.o cascade/flashinfer_cascade_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o cascade/cascade.so 2025-09-07T10:35:32.9583144Z #47 781.9 [377/412] c++ page/page.cuda.o page/flashinfer_page_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o page/page.so 2025-09-07T10:35:33.2163106Z #47 782.2 [378/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output norm/norm.cuda.o.d -DTORCH_EXTENSION_NAME=norm -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/norm.cu -o norm/norm.cuda.o 2025-09-07T10:35:33.8399790Z #47 782.8 [379/412] c++ norm/norm.cuda.o norm/flashinfer_norm_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o norm/norm.so 2025-09-07T10:35:36.3954817Z #47 785.4 [380/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output rope/flashinfer_rope_ops.cuda.o.d -DTORCH_EXTENSION_NAME=rope -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_rope_ops.cu -o rope/flashinfer_rope_ops.cuda.o 2025-09-07T10:35:37.4609990Z #47 786.4 [381/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output quantization/quantization.cuda.o.d -DTORCH_EXTENSION_NAME=quantization -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/quantization.cu -o quantization/quantization.cuda.o 2025-09-07T10:35:37.8896718Z #47 786.9 [382/412] c++ quantization/quantization.cuda.o quantization/flashinfer_quantization_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o quantization/quantization.so 2025-09-07T10:35:38.7265437Z #47 787.7 [383/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output sampling/flashinfer_sampling_ops.cuda.o.d -DTORCH_EXTENSION_NAME=sampling -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_sampling_ops.cu -o sampling/flashinfer_sampling_ops.cuda.o 2025-09-07T10:35:40.6958178Z #47 789.7 [384/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output trtllm_utils/delayStream.cuda.o.d -DTORCH_EXTENSION_NAME=trtllm_utils -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/csrc/nv_internal -I/workspace/flashinfer/csrc/nv_internal/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/delayStream.cu -o trtllm_utils/delayStream.cuda.o 2025-09-07T10:35:41.0827125Z #47 790.1 [385/412] c++ trtllm_utils/delayStream.cuda.o trtllm_utils/envUtils.o trtllm_utils/logger.o trtllm_utils/stringUtils.o trtllm_utils/tllmException.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o trtllm_utils/trtllm_utils.so 2025-09-07T10:35:41.7167194Z #47 790.7 [386/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:35:45.9508351Z #47 794.9 [387/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:35:47.5588768Z #47 796.5 [388/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:35:48.0546326Z #47 797.0 [389/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:35:48.5376603Z #47 797.5 [390/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:35:49.1209004Z #47 798.1 [391/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:35:49.2770082Z #47 798.2 [392/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:35:49.2787822Z #47 798.2 [393/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:35:52.2684394Z #47 801.2 [394/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:35:52.7523206Z #47 801.7 [395/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:35:52.9735794Z #47 801.9 [396/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output sampling/renorm.cuda.o.d -DTORCH_EXTENSION_NAME=sampling -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/renorm.cu -o sampling/renorm.cuda.o 2025-09-07T10:35:54.2779249Z #47 803.2 [397/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T10:35:54.5733155Z #47 803.5 [398/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:35:54.7862202Z #47 803.8 [399/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T10:35:56.5785088Z #47 805.5 [400/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:35:56.9671653Z #47 805.9 [401/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output rope/rope.cuda.o.d -DTORCH_EXTENSION_NAME=rope -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/rope.cu -o rope/rope.cuda.o 2025-09-07T10:35:57.1471324Z #47 806.0 [402/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so 2025-09-07T10:35:57.3430468Z #47 806.3 [403/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T10:35:57.5203406Z #47 806.4 [404/412] c++ rope/rope.cuda.o rope/flashinfer_rope_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o rope/rope.so 2025-09-07T10:35:57.5213489Z #47 806.5 [405/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T10:35:57.9353374Z #47 806.9 [406/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so 2025-09-07T10:36:22.0022644Z #47 831.0 [407/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output sampling/sampling.cuda.o.d -DTORCH_EXTENSION_NAME=sampling -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/sampling.cu -o sampling/sampling.cuda.o 2025-09-07T10:36:22.4239986Z #47 831.4 [408/412] c++ sampling/sampling.cuda.o sampling/renorm.cuda.o sampling/flashinfer_sampling_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o sampling/sampling.so 2025-09-07T10:36:51.2180408Z #47 860.2 [409/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output mla/cutlass_mla.cuda.o.d -DTORCH_EXTENSION_NAME=mla -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_100a,code=sm_100a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/csrc/cutlass_mla.cu -o mla/cutlass_mla.cuda.o 2025-09-07T10:36:51.5231453Z #47 860.5 [410/412] c++ mla/cutlass_mla.cuda.o mla/flashinfer_mla_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o mla/mla.so 2025-09-07T10:37:00.4433221Z #47 869.4 [411/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output fmha_cutlass_sm100a/fmha_cutlass_sm100.cuda.o.d -DTORCH_EXTENSION_NAME=fmha_cutlass_sm100a -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_100a,code=sm_100a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/csrc/fmha_cutlass_sm100.cu -o fmha_cutlass_sm100a/fmha_cutlass_sm100.cuda.o 2025-09-07T10:37:00.7367280Z #47 869.7 [412/412] c++ fmha_cutlass_sm100a/fmha_cutlass_sm100.cuda.o fmha_cutlass_sm100a/fmha_cutlass_sm100_pybind.cuda.o fmha_cutlass_sm100a/blackwell_fmha_plan.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o fmha_cutlass_sm100a/fmha_cutlass_sm100a.so 2025-09-07T10:37:01.0173512Z #47 870.0 AOT kernels saved to: /workspace/flashinfer/aot-ops 2025-09-07T10:37:01.5107450Z #47 870.5 * Getting build dependencies for wheel... 2025-09-07T10:37:01.6551626Z #47 870.6 60 AOT ops found in /workspace/flashinfer/aot-ops 2025-09-07T10:37:01.7898319Z #47 870.8 * Building wheel... 2025-09-07T10:37:03.7767959Z #47 872.7 W0907 10:37:03.775000 22980 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/torch/utils/cpp_extension.py:119] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 2025-09-07T10:37:04.0782716Z #47 873.0 60 AOT ops found in /workspace/flashinfer/aot-ops 2025-09-07T10:37:04.0783191Z #47 873.0 running bdist_wheel 2025-09-07T10:37:04.2147738Z #47 873.1 running build 2025-09-07T10:37:04.2148559Z #47 873.1 running build_py 2025-09-07T10:37:04.2149490Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2150153Z #47 873.1 copying flashinfer/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2150919Z #47 873.1 copying flashinfer/__main__.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2152157Z #47 873.1 copying flashinfer/activation.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2153036Z #47 873.1 copying flashinfer/aot.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2154583Z #47 873.1 copying flashinfer/artifacts.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2155432Z #47 873.1 copying flashinfer/attention.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2156494Z #47 873.1 copying flashinfer/autotuner.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2157511Z #47 873.1 copying flashinfer/cascade.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2158279Z #47 873.1 copying flashinfer/cuda_utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2159272Z #47 873.1 copying flashinfer/decode.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2160024Z #47 873.1 copying flashinfer/deep_gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2161071Z #47 873.1 copying flashinfer/fp4_quantization.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2162169Z #47 873.1 copying flashinfer/fp8_quantization.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2162940Z #47 873.1 copying flashinfer/gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2163839Z #47 873.1 copying flashinfer/green_ctx.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2164565Z #47 873.1 copying flashinfer/mla.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2165266Z #47 873.1 copying flashinfer/norm.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2166074Z #47 873.1 copying flashinfer/page.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2166856Z #47 873.1 copying flashinfer/pod.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2167574Z #47 873.1 copying flashinfer/prefill.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2168623Z #47 873.1 copying flashinfer/quantization.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2169385Z #47 873.1 copying flashinfer/rope.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2170423Z #47 873.1 copying flashinfer/sampling.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2171703Z #47 873.1 copying flashinfer/sparse.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2172574Z #47 873.1 copying flashinfer/tllm_utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2173547Z #47 873.1 copying flashinfer/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2174309Z #47 873.1 copying flashinfer/_build_meta.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.2175102Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/cute_dsl 2025-09-07T10:37:04.2176006Z #47 873.1 copying flashinfer/cute_dsl/blockscaled_gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/cute_dsl 2025-09-07T10:37:04.2177318Z #47 873.1 copying flashinfer/cute_dsl/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/cute_dsl 2025-09-07T10:37:04.2178132Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/data 2025-09-07T10:37:04.2178812Z #47 873.1 copying ./custom_backend.py -> build/lib.linux-x86_64-cpython-312/flashinfer/data 2025-09-07T10:37:04.2179680Z #47 873.1 copying ./setup.py -> build/lib.linux-x86_64-cpython-312/flashinfer/data 2025-09-07T10:37:04.2180344Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe 2025-09-07T10:37:04.2181106Z #47 873.1 copying flashinfer/fused_moe/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe 2025-09-07T10:37:04.2182254Z #47 873.1 copying flashinfer/fused_moe/core.py -> build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe 2025-09-07T10:37:04.2194625Z #47 873.1 copying flashinfer/fused_moe/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe 2025-09-07T10:37:04.2195598Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T10:37:04.2196355Z #47 873.1 copying flashinfer/jit/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T10:37:04.2197688Z #47 873.1 copying flashinfer/jit/activation.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T10:37:04.2198522Z #47 873.1 copying flashinfer/jit/core.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T10:37:04.2199520Z #47 873.1 copying flashinfer/jit/cpp_ext.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T10:37:04.2200338Z #47 873.1 copying flashinfer/jit/cubin_loader.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T10:37:04.2201242Z #47 873.1 copying flashinfer/jit/env.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T10:37:04.2202072Z #47 873.1 copying flashinfer/jit/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T10:37:04.2202798Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T10:37:04.2203923Z #47 873.1 copying flashinfer/jit/attention/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T10:37:04.2204944Z #47 873.1 copying flashinfer/jit/attention/pytorch.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T10:37:04.2206110Z #47 873.1 copying flashinfer/jit/attention/tvm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T10:37:04.2207084Z #47 873.1 copying flashinfer/jit/attention/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T10:37:04.2208370Z #47 873.1 copying flashinfer/jit/attention/variants.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T10:37:04.2209227Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm 2025-09-07T10:37:04.2210076Z #47 873.1 copying flashinfer/jit/cutlass_gemm/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm 2025-09-07T10:37:04.2211585Z #47 873.1 copying flashinfer/jit/cutlass_gemm/cutlass_library.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm 2025-09-07T10:37:04.2213015Z #47 873.1 copying flashinfer/jit/cutlass_gemm/generate_kernels.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm 2025-09-07T10:37:04.2214199Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/testing 2025-09-07T10:37:04.2214961Z #47 873.1 copying flashinfer/testing/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/testing 2025-09-07T10:37:04.2216023Z #47 873.1 copying flashinfer/testing/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/testing 2025-09-07T10:37:04.2216785Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T10:37:04.2217745Z #47 873.1 copying flashinfer/triton/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T10:37:04.2218709Z #47 873.1 copying flashinfer/triton/activation.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T10:37:04.2219994Z #47 873.1 copying flashinfer/triton/cascade.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T10:37:04.2220946Z #47 873.1 copying flashinfer/triton/gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T10:37:04.2222100Z #47 873.1 copying flashinfer/triton/norm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T10:37:04.2222968Z #47 873.1 copying flashinfer/triton/page.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T10:37:04.2224111Z #47 873.1 copying flashinfer/triton/sm_constraint_gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T10:37:04.2225126Z #47 873.1 copying flashinfer/triton/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T10:37:04.2226036Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/tuning_configs 2025-09-07T10:37:04.2227156Z #47 873.1 copying flashinfer/tuning_configs/v0_1_trtllm_fused_moe_NVIDIA_B200.py -> build/lib.linux-x86_64-cpython-312/flashinfer/tuning_configs 2025-09-07T10:37:04.2228236Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/profiler 2025-09-07T10:37:04.2229189Z #47 873.1 copying flashinfer/profiler/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/profiler 2025-09-07T10:37:04.2229970Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T10:37:04.2231191Z #47 873.1 copying flashinfer/triton/kernels/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T10:37:04.2232309Z #47 873.1 copying flashinfer/triton/kernels/activation.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T10:37:04.2233611Z #47 873.1 copying flashinfer/triton/kernels/cascade.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T10:37:04.2234740Z #47 873.1 copying flashinfer/triton/kernels/norm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T10:37:04.2235796Z #47 873.1 copying flashinfer/triton/kernels/quant.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T10:37:04.2237131Z #47 873.1 copying flashinfer/triton/kernels/sm_constraint_gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T10:37:04.2238126Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2238930Z #47 873.1 copying flashinfer/comm/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2239853Z #47 873.1 copying flashinfer/comm/cuda_ipc.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2240694Z #47 873.1 copying flashinfer/comm/dlpack_utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2241815Z #47 873.1 copying flashinfer/comm/mapping.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2242635Z #47 873.1 copying flashinfer/comm/mnnvl.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2243443Z #47 873.1 copying flashinfer/comm/nvshmem.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2244496Z #47 873.1 copying flashinfer/comm/nvshmem_allreduce.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2245479Z #47 873.1 copying flashinfer/comm/trtllm_alltoall.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2246743Z #47 873.1 copying flashinfer/comm/trtllm_ar.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2247615Z #47 873.1 copying flashinfer/comm/trtllm_mnnvl_ar.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2248907Z #47 873.1 copying flashinfer/comm/vllm_ar.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T10:37:04.2249801Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/cudnn 2025-09-07T10:37:04.2250519Z #47 873.1 copying flashinfer/cudnn/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/cudnn 2025-09-07T10:37:04.2251723Z #47 873.1 copying flashinfer/cudnn/decode.py -> build/lib.linux-x86_64-cpython-312/flashinfer/cudnn 2025-09-07T10:37:04.2252608Z #47 873.1 copying flashinfer/cudnn/prefill.py -> build/lib.linux-x86_64-cpython-312/flashinfer/cudnn 2025-09-07T10:37:04.2253691Z #47 873.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T10:37:04.2254642Z #47 873.1 copying flashinfer/logits_processor/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T10:37:04.2256165Z #47 873.1 copying flashinfer/logits_processor/compiler.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T10:37:04.2257358Z #47 873.1 copying flashinfer/logits_processor/fusion_rules.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T10:37:04.2258734Z #47 873.1 copying flashinfer/logits_processor/legalization.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T10:37:04.2260209Z #47 873.1 copying flashinfer/logits_processor/op.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T10:37:04.2261427Z #47 873.1 copying flashinfer/logits_processor/operators.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T10:37:04.2262541Z #47 873.1 copying flashinfer/logits_processor/pipeline.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T10:37:04.2264085Z #47 873.1 copying flashinfer/logits_processor/processors.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T10:37:04.2265331Z #47 873.1 copying flashinfer/logits_processor/types.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T10:37:04.2266615Z #47 873.1 copying flashinfer/logits_processor/validators.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T10:37:04.2267585Z #47 873.2 copying flashinfer/py.typed -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T10:37:04.3150244Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3151025Z #47 873.2 copying ./csrc/activation.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3151927Z #47 873.2 copying ./csrc/aot_extension_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3153062Z #47 873.2 copying ./csrc/batch_attention.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3154032Z #47 873.2 copying ./csrc/batch_attention_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3155129Z #47 873.2 copying ./csrc/batch_attention_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3156232Z #47 873.2 copying ./csrc/batch_attention_paged_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3157193Z #47 873.2 copying ./csrc/batch_decode.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3158139Z #47 873.2 copying ./csrc/batch_decode_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3159223Z #47 873.2 copying ./csrc/batch_decode_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3160340Z #47 873.2 copying ./csrc/batch_decode_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3161733Z #47 873.2 copying ./csrc/batch_decode_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3162820Z #47 873.2 copying ./csrc/batch_decode_mla_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3163889Z #47 873.2 copying ./csrc/batch_decode_mla_cute_sm80.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3164815Z #47 873.2 copying ./csrc/batch_decode_mla_plan.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3166117Z #47 873.2 copying ./csrc/batch_decode_mla_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3167005Z #47 873.2 copying ./csrc/batch_decode_mla_run.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3168089Z #47 873.2 copying ./csrc/batch_mla_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3168933Z #47 873.2 copying ./csrc/batch_mla_plan.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3169901Z #47 873.2 copying ./csrc/batch_mla_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3170777Z #47 873.2 copying ./csrc/batch_mla_run.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3171858Z #47 873.2 copying ./csrc/batch_mla_sm90_plan.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3172869Z #47 873.2 copying ./csrc/batch_mla_sm90_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3173758Z #47 873.2 copying ./csrc/batch_mla_sm90_run.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3174679Z #47 873.2 copying ./csrc/batch_prefill.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3175700Z #47 873.2 copying ./csrc/batch_prefill_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3176783Z #47 873.2 copying ./csrc/batch_prefill_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3178403Z #47 873.2 copying ./csrc/batch_prefill_fp8_paged_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3179656Z #47 873.2 copying ./csrc/batch_prefill_fp8_ragged_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3180667Z #47 873.2 copying ./csrc/batch_prefill_fp8_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3181962Z #47 873.2 copying ./csrc/batch_prefill_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3183105Z #47 873.2 copying ./csrc/batch_prefill_paged_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3184354Z #47 873.2 copying ./csrc/batch_prefill_paged_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3185469Z #47 873.2 copying ./csrc/batch_prefill_ragged_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3186702Z #47 873.2 copying ./csrc/batch_prefill_ragged_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3187662Z #47 873.2 copying ./csrc/batch_prefill_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3188697Z #47 873.2 copying ./csrc/batch_prefill_sm90_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3189689Z #47 873.2 copying ./csrc/batch_prefill_sm90_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3190933Z #47 873.2 copying ./csrc/batch_prefill_sm90_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3192111Z #47 873.2 copying ./csrc/blackwell_fmha_plan.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3192990Z #47 873.2 copying ./csrc/bmm_fp8.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3193744Z #47 873.2 copying ./csrc/cascade.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3194821Z #47 873.2 copying ./csrc/cudnn_sdpa_kernel_launcher.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3195769Z #47 873.2 copying ./csrc/cudnn_sdpa_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3196659Z #47 873.2 copying ./csrc/cutlass_mla.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3197653Z #47 873.2 copying ./csrc/flashinfer_cascade_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3198547Z #47 873.2 copying ./csrc/flashinfer_gemm_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3199632Z #47 873.2 copying ./csrc/flashinfer_gemm_sm90_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3200521Z #47 873.2 copying ./csrc/flashinfer_mla_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3201579Z #47 873.2 copying ./csrc/flashinfer_norm_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3202544Z #47 873.2 copying ./csrc/flashinfer_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3203509Z #47 873.2 copying ./csrc/flashinfer_ops_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3204669Z #47 873.2 copying ./csrc/flashinfer_page_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3205641Z #47 873.2 copying ./csrc/flashinfer_quantization_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3206547Z #47 873.2 copying ./csrc/flashinfer_rope_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3207474Z #47 873.2 copying ./csrc/flashinfer_sampling_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3208425Z #47 873.2 copying ./csrc/fmha_cutlass_sm100.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3209369Z #47 873.2 copying ./csrc/fmha_cutlass_sm100_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3210334Z #47 873.2 copying ./csrc/fp4_gemm_cutlass.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3211489Z #47 873.2 copying ./csrc/fp4_gemm_cutlass.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3212716Z #47 873.2 copying ./csrc/fp8_gemm_cutlass.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3213602Z #47 873.2 copying ./csrc/fp8_gemm_cutlass.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3214587Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T10:37:04.3215960Z #47 873.2 copying ./csrc/fused_moe/cutlass_backend/cutlass_fused_moe_instantiation.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T10:37:04.3217729Z #47 873.2 copying ./csrc/fused_moe/cutlass_backend/cutlass_fused_moe_kernels.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T10:37:04.3219471Z #47 873.2 copying ./csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T10:37:04.3220731Z #47 873.2 copying ./csrc/gemm_groupwise_sm100.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3221769Z #47 873.2 copying ./csrc/gemm_groupwise_sm100_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3222808Z #47 873.2 copying ./csrc/gemm_sm100_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3223741Z #47 873.2 copying ./csrc/group_gemm.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3224764Z #47 873.2 copying ./csrc/group_gemm_fp8_groupwise_sm100.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3225822Z #47 873.2 copying ./csrc/group_gemm_fp8_groupwise_sm100_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3227221Z #47 873.2 copying ./csrc/group_gemm_mxfp4_groupwise_sm100.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3228510Z #47 873.2 copying ./csrc/group_gemm_mxfp4_groupwise_sm100_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3229529Z #47 873.2 copying ./csrc/group_gemm_sm100_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3230527Z #47 873.2 copying ./csrc/group_gemm_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3231430Z #47 873.2 copying ./csrc/group_gemm_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3232283Z #47 873.2 copying ./csrc/logging.cc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3233364Z #47 873.2 copying ./csrc/norm.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3234151Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:04.3235185Z #47 873.2 copying ./csrc/nv_internal/cpp/common/envUtils.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:04.3236697Z #47 873.2 copying ./csrc/nv_internal/cpp/common/logger.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:04.3237950Z #47 873.2 copying ./csrc/nv_internal/cpp/common/memoryUtils.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:04.3239438Z #47 873.2 copying ./csrc/nv_internal/cpp/common/stringUtils.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:04.3240941Z #47 873.2 copying ./csrc/nv_internal/cpp/common/tllmException.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:04.3242089Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/kernels 2025-09-07T10:37:04.3243508Z #47 873.2 copying ./csrc/nv_internal/cpp/kernels/quantization.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/kernels 2025-09-07T10:37:04.3244764Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:04.3246211Z #47 873.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/NvInferRuntime.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:04.3248110Z #47 873.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/assert.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:04.3250457Z #47 873.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/cudaBf16Wrapper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:04.3252387Z #47 873.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/cudaFp8Utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:04.3254266Z #47 873.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/cudaUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:04.3256315Z #47 873.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/dataType.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:04.3258125Z #47 873.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/logger.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:04.3259771Z #47 873.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/quantization.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:04.3261592Z #47 873.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/stringUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:04.3263734Z #47 873.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/tllmException.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:04.3265274Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:04.3266639Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/common/cublasMMWrapper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:04.3268164Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/common/cudaBf16Fallbacks.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:04.3269741Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/common/cudaDriverWrapper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:04.3271501Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/common/cudaTypeUtils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:04.3273186Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/common/envUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:04.3274726Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/common/memoryUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:04.3276332Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/common/quantTypeUtils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:04.3278072Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:04.3279893Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/common/workspace.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:04.3281401Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:04.3283460Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_red_global.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:04.3286073Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_sm90_multimem.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:04.3288853Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_traits_sm90_multimem.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:04.3291797Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/grid_dependency_control.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:04.3294616Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:04.3296985Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective 2025-09-07T10:37:04.3299914Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective/sm90_allreduce_nvls_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective 2025-09-07T10:37:04.3302937Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/compute_occupancy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:04.3305151Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective 2025-09-07T10:37:04.3307261Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective/mixed_input_utils.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective 2025-09-07T10:37:04.3309432Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective 2025-09-07T10:37:04.3312059Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective/epilogue_moe_finalize.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective 2025-09-07T10:37:04.3314216Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion 2025-09-07T10:37:04.3316417Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_allreduce_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion 2025-09-07T10:37:04.3318600Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread 2025-09-07T10:37:04.3320681Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread/fused_activations.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread 2025-09-07T10:37:04.3323140Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue_helpers.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:04.3325053Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T10:37:04.3327305Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_gated.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T10:37:04.3330280Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_interleaved.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T10:37:04.3333629Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_mixed_input.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T10:37:04.3336592Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_gated.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:04.3339456Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_interleaved.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:04.3342349Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_mixed_input.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:04.3345316Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_array_mixed_input.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:04.3348081Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_gated.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:04.3351214Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_interleaved.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:04.3354174Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input_.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:04.3357179Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:04.3360154Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized_fp8.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:04.3363232Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_interleaved_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:04.3365286Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3367199Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/default_fpA_intB_traits.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3369542Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3372272Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_routine.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3374965Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_traits.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3377638Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_moe_problem_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3380334Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_universal_allreduce.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3383556Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/mixed_gemm_B_layout.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3386638Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cute_util.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3389170Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cutlass_kernel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3391723Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_problem_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3394398Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3397216Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized_pingpong.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:04.3399403Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3401462Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3404154Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3406874Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3409567Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3412487Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma_bf16.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3415160Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3417886Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3420708Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_finegrained.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3423632Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_percol.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3426286Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3429082Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_finegrained.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3431731Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_percol.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:04.3433713Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T10:37:04.3435640Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/default_mma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T10:37:04.3438150Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_compute_B_with_f16.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T10:37:04.3440654Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_dequantizer.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T10:37:04.3442970Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm_configs.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:04.3445257Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/interleaved_numeric_conversion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:04.3447529Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/system_barrier.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:04.3450110Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/tile_interleaved_layout.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:04.3452193Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock 2025-09-07T10:37:04.3454578Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock/fine_grained_scale_zero_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock 2025-09-07T10:37:04.3456809Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util 2025-09-07T10:37:04.3458747Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util/gather_tensor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util 2025-09-07T10:37:04.3461152Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/weight_only_quant_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:04.3462984Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T10:37:04.3464423Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T10:37:04.3466219Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T10:37:04.3468016Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_type_conversion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T10:37:04.3469565Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T10:37:04.3471282Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T10:37:04.3473497Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm_stub.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T10:37:04.3475195Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3476833Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scalebias.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3478937Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scaleonly.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3480994Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_per_col.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3483044Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scalebias.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3485152Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scaleonly.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3487199Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_per_col.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3489331Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_bf16_out_bf16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3491735Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_f16_out_f16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3494047Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_bf16_out_bf16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3496363Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_f16_out_f16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3498639Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_per_col_f16_out_f16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3500867Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scalebias.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3503077Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scaleonly.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3505234Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_per_col.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3507172Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scalebias.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3509124Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scaleonly.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3511102Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_per_col.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3512985Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3514861Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3516836Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template_sm90.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:04.3518433Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T10:37:04.3520128Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T10:37:04.3522278Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T10:37:04.3523886Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:04.3525281Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/common.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:04.3527085Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/cutlass_kernel_selector.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:04.3528921Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_gemm_kernels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:04.3530684Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_kernels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:04.3532830Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_util_kernels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:04.3534542Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:04.3536364Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:04.3538977Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:04.3541563Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:04.3543978Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:04.3546263Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:04.3548613Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:04.3551283Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_bf16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3553433Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3555532Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp8.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3557637Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3560200Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint8.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3562401Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3564440Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3566474Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3568515Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint8.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3570628Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp32_fp32.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3572953Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp4_fp4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3575023Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3577396Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp8.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3579962Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_uint4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3582072Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3584311Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3586531Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws_mixed_dtype.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3588737Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_tma_warp_specialized_input.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3590863Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_tma_warp_specialized_traits.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:04.3592644Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/delayStream.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:04.3594106Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/delayStream.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:04.3595339Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T10:37:04.3596598Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/lora/lora.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T10:37:04.3598075Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/lora/lora.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T10:37:04.3599607Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:04.3601180Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:04.3602735Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/quantization.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:04.3604216Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/quantization.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:04.3605414Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime 2025-09-07T10:37:04.3606589Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/runtime/torchUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime 2025-09-07T10:37:04.3607760Z #47 873.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:04.3608908Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/thop/fp4Op.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:04.3610289Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:04.3611966Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:04.3613399Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:04.3614839Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:04.3616247Z #47 873.2 copying ./csrc/nv_internal/tensorrt_llm/thop/thUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:04.3617394Z #47 873.2 copying ./csrc/nvshmem_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3618216Z #47 873.2 copying ./csrc/page.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3618979Z #47 873.2 copying ./csrc/pod.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3619761Z #47 873.2 copying ./csrc/pod_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3620647Z #47 873.2 copying ./csrc/pod_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3621525Z #47 873.2 copying ./csrc/pod_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3622400Z #47 873.2 copying ./csrc/pod_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3623508Z #47 873.2 copying ./csrc/pytorch_conversion_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3624394Z #47 873.2 copying ./csrc/pytorch_extension_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3625237Z #47 873.2 copying ./csrc/quantization.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3625988Z #47 873.2 copying ./csrc/renorm.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3626708Z #47 873.2 copying ./csrc/rope.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3627437Z #47 873.2 copying ./csrc/runtime_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3628205Z #47 873.2 copying ./csrc/sampling.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3628971Z #47 873.2 copying ./csrc/single_decode.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3629789Z #47 873.2 copying ./csrc/single_decode_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3630717Z #47 873.2 copying ./csrc/single_decode_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3631678Z #47 873.2 copying ./csrc/single_decode_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3632591Z #47 873.3 copying ./csrc/single_decode_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3633463Z #47 873.3 copying ./csrc/single_prefill.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3634291Z #47 873.3 copying ./csrc/single_prefill_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3635232Z #47 873.3 copying ./csrc/single_prefill_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3636187Z #47 873.3 copying ./csrc/single_prefill_fp8_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3637131Z #47 873.3 copying ./csrc/single_prefill_fp8_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3638087Z #47 873.3 copying ./csrc/single_prefill_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3638996Z #47 873.3 copying ./csrc/single_prefill_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3639882Z #47 873.3 copying ./csrc/single_prefill_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3640744Z #47 873.3 copying ./csrc/single_prefill_sm90_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3641720Z #47 873.3 copying ./csrc/single_prefill_sm90_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3642708Z #47 873.3 copying ./csrc/single_prefill_sm90_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3643689Z #47 873.3 copying ./csrc/single_prefill_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3644604Z #47 873.3 copying ./csrc/trtllm_allreduce.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3645472Z #47 873.3 copying ./csrc/trtllm_allreduce_fusion.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3646324Z #47 873.3 copying ./csrc/trtllm_alltoall.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3647188Z #47 873.3 copying ./csrc/trtllm_batched_gemm_runner.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3648083Z #47 873.3 copying ./csrc/trtllm_fmha_kernel_launcher.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3649350Z #47 873.3 copying ./csrc/trtllm_fused_moe_dev_kernel.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3650518Z #47 873.3 copying ./csrc/trtllm_fused_moe_kernel_launcher.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3651598Z #47 873.3 copying ./csrc/trtllm_fused_moe_routing_deepseek.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3652621Z #47 873.3 copying ./csrc/trtllm_fused_moe_routing_llama4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3653645Z #47 873.3 copying ./csrc/trtllm_fused_moe_routing_renormalize.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3654643Z #47 873.3 copying ./csrc/trtllm_fused_moe_runner.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3655549Z #47 873.3 copying ./csrc/trtllm_gemm_runner.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3656449Z #47 873.3 copying ./csrc/trtllm_mnnvl_allreduce.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3657401Z #47 873.3 copying ./csrc/trtllm_moe_allreduce_fusion.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3658412Z #47 873.3 copying ./csrc/vllm_custom_all_reduce.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T10:37:04.3659253Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3660220Z #47 873.3 copying ./include/flashinfer/activation.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3661335Z #47 873.3 copying ./include/flashinfer/allocator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3662477Z #47 873.3 copying ./include/flashinfer/arch_condition.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3663920Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:04.3665270Z #47 873.3 copying ./include/flashinfer/attention/blackwell/collective/fmha_common.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:04.3667169Z #47 873.3 copying ./include/flashinfer/attention/blackwell/collective/fmha_fusion.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:04.3669372Z #47 873.3 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_fwd_epilogue_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:04.3671581Z #47 873.3 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_fwd_mainloop_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:04.3673764Z #47 873.3 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_gen_epilogue_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:04.3675880Z #47 873.3 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_gen_mainloop_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:04.3678040Z #47 873.3 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_load_cpasync_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:04.3680131Z #47 873.3 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_load_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:04.3681792Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/common 2025-09-07T10:37:04.3683178Z #47 873.3 copying ./include/flashinfer/attention/blackwell/common/pow_2.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/common 2025-09-07T10:37:04.3684423Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T10:37:04.3685659Z #47 873.3 copying ./include/flashinfer/attention/blackwell/device/fmha.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T10:37:04.3687216Z #47 873.3 copying ./include/flashinfer/attention/blackwell/device/sm100_mla.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T10:37:04.3688744Z #47 873.3 copying ./include/flashinfer/attention/blackwell/fmha_cutlass_sm100.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell 2025-09-07T10:37:04.3689961Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:04.3691484Z #47 873.3 copying ./include/flashinfer/attention/blackwell/kernel/fmha_options.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:04.3693568Z #47 873.3 copying ./include/flashinfer/attention/blackwell/kernel/fmha_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:04.3695407Z #47 873.3 copying ./include/flashinfer/attention/blackwell/kernel/gather_tensor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:04.3697370Z #47 873.3 copying ./include/flashinfer/attention/blackwell/kernel/sm100_fmha_fwd_kernel_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:04.3699491Z #47 873.3 copying ./include/flashinfer/attention/blackwell/kernel/sm100_fmha_gen_kernel_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:04.3701480Z #47 873.3 copying ./include/flashinfer/attention/blackwell/kernel/sm100_fmha_mla_reduction.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:04.3703454Z #47 873.3 copying ./include/flashinfer/attention/blackwell/kernel/sm100_fmha_mla_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:04.3705487Z #47 873.3 copying ./include/flashinfer/attention/blackwell/kernel/sm100_mla_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:04.3707157Z #47 873.3 copying ./include/flashinfer/attention/blackwell/plan.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell 2025-09-07T10:37:04.3708597Z #47 873.3 copying ./include/flashinfer/attention/cascade.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3709919Z #47 873.3 copying ./include/flashinfer/attention/cutlass_mla.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3711275Z #47 873.3 copying ./include/flashinfer/attention/decode.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3712719Z #47 873.3 copying ./include/flashinfer/attention/decode_mla_cute_sm80.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3714083Z #47 873.3 copying ./include/flashinfer/attention/default_decode_params.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3715483Z #47 873.3 copying ./include/flashinfer/attention/default_prefill_params.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3716792Z #47 873.3 copying ./include/flashinfer/attention/heap.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3718024Z #47 873.3 copying ./include/flashinfer/attention/hopper.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3719104Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3720317Z #47 873.3 copying ./include/flashinfer/attention/hopper/attention_updater.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3722010Z #47 873.3 copying ./include/flashinfer/attention/hopper/block_sparse_gather.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3723571Z #47 873.3 copying ./include/flashinfer/attention/hopper/default_params.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3725105Z #47 873.3 copying ./include/flashinfer/attention/hopper/epilogue.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3726611Z #47 873.3 copying ./include/flashinfer/attention/hopper/kernel_traits.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3728110Z #47 873.3 copying ./include/flashinfer/attention/hopper/mainloop.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3729593Z #47 873.3 copying ./include/flashinfer/attention/hopper/mainloop_mma.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3731218Z #47 873.3 copying ./include/flashinfer/attention/hopper/named_barrier.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3732952Z #47 873.3 copying ./include/flashinfer/attention/hopper/prefill_sm90.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3734291Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:04.3735793Z #47 873.3 copying ./include/flashinfer/attention/hopper/quantization/epilogue.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:04.3737692Z #47 873.3 copying ./include/flashinfer/attention/hopper/quantization/kernel_traits.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:04.3739594Z #47 873.3 copying ./include/flashinfer/attention/hopper/quantization/mainloop_load.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:04.3741544Z #47 873.3 copying ./include/flashinfer/attention/hopper/quantization/mainloop_mma.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:04.3743609Z #47 873.3 copying ./include/flashinfer/attention/hopper/quantization/mainloop_sparse_load.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:04.3745318Z #47 873.3 copying ./include/flashinfer/attention/hopper/quantization/prefill_sm90.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:04.3746870Z #47 873.3 copying ./include/flashinfer/attention/hopper/sparse_mainloop.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3748272Z #47 873.3 copying ./include/flashinfer/attention/hopper/tile_scheduler.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3750042Z #47 873.3 copying ./include/flashinfer/attention/hopper/utils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3751587Z #47 873.3 copying ./include/flashinfer/attention/hopper/variant_helper.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3753130Z #47 873.3 copying ./include/flashinfer/attention/hopper/variants.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:04.3754551Z #47 873.3 copying ./include/flashinfer/attention/mask.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3755845Z #47 873.3 copying ./include/flashinfer/attention/mla.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3757163Z #47 873.3 copying ./include/flashinfer/attention/mla_hopper.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3758583Z #47 873.3 copying ./include/flashinfer/attention/mla_params.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3759955Z #47 873.3 copying ./include/flashinfer/attention/persistent.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3761489Z #47 873.3 copying ./include/flashinfer/attention/persistent_template.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3762797Z #47 873.3 copying ./include/flashinfer/attention/pod.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3764025Z #47 873.3 copying ./include/flashinfer/attention/prefill.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3765355Z #47 873.3 copying ./include/flashinfer/attention/scheduler.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3766624Z #47 873.3 copying ./include/flashinfer/attention/state.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3767902Z #47 873.3 copying ./include/flashinfer/attention/variant_helper.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3769219Z #47 873.3 copying ./include/flashinfer/attention/variants.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:04.3770413Z #47 873.3 copying ./include/flashinfer/attention_impl.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3771591Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:04.3772697Z #47 873.3 copying ./include/flashinfer/comm/trtllm_allreduce.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:04.3774018Z #47 873.3 copying ./include/flashinfer/comm/trtllm_allreduce_fusion.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:04.3775381Z #47 873.3 copying ./include/flashinfer/comm/trtllm_alltoall.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:04.3776702Z #47 873.3 copying ./include/flashinfer/comm/trtllm_mnnvl_allreduce.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:04.3778058Z #47 873.3 copying ./include/flashinfer/comm/trtllm_moe_allreduce_fusion.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:04.3779414Z #47 873.3 copying ./include/flashinfer/comm/vllm_custom_all_reduce.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:04.3780628Z #47 873.3 copying ./include/flashinfer/cp_async.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3781729Z #47 873.3 copying ./include/flashinfer/cubin_loader.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3782876Z #47 873.3 copying ./include/flashinfer/cutlass_utils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3784086Z #47 873.3 copying ./include/flashinfer/exception.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3785124Z #47 873.3 copying ./include/flashinfer/fastdiv.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3786135Z #47 873.3 copying ./include/flashinfer/fp16.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3787142Z #47 873.3 copying ./include/flashinfer/fp4_layout.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3788241Z #47 873.3 copying ./include/flashinfer/frag_layout_swizzle.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3789213Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3790159Z #47 873.3 copying ./include/flashinfer/gemm/bmm_fp8.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3791340Z #47 873.3 copying ./include/flashinfer/gemm/cutlass_gemm_configs.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3792534Z #47 873.3 copying ./include/flashinfer/gemm/fp4_gemm_cutlass.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3793758Z #47 873.3 copying ./include/flashinfer/gemm/fp4_gemm_cutlass_template.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3795047Z #47 873.3 copying ./include/flashinfer/gemm/fp4_gemm_template_sm100.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3796250Z #47 873.3 copying ./include/flashinfer/gemm/fp8_gemm_cutlass.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3797474Z #47 873.3 copying ./include/flashinfer/gemm/fp8_gemm_cutlass_template.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3798711Z #47 873.3 copying ./include/flashinfer/gemm/fp8_gemm_template_sm100.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3799954Z #47 873.3 copying ./include/flashinfer/gemm/gemm_groupwise_sm100.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3801162Z #47 873.3 copying ./include/flashinfer/gemm/group_gemm.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3802621Z #47 873.3 copying ./include/flashinfer/gemm/group_gemm_fp8_groupwise_sm100.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3803996Z #47 873.3 copying ./include/flashinfer/gemm/group_gemm_lora.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3805323Z #47 873.3 copying ./include/flashinfer/gemm/group_gemm_mxfp4_groupwise_sm100.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3806581Z #47 873.3 copying ./include/flashinfer/gemm/group_gemm_sm90.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3807749Z #47 873.3 copying ./include/flashinfer/gemm/group_gemv.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:04.3808835Z #47 873.3 copying ./include/flashinfer/layout.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3809862Z #47 873.3 copying ./include/flashinfer/logging.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3810879Z #47 873.3 copying ./include/flashinfer/math.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3812134Z #47 873.3 copying ./include/flashinfer/mma.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3813185Z #47 873.3 copying ./include/flashinfer/norm.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3814242Z #47 873.3 copying ./include/flashinfer/page.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3815339Z #47 873.3 copying ./include/flashinfer/permuted_smem.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3816466Z #47 873.3 copying ./include/flashinfer/pos_enc.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3817564Z #47 873.3 copying ./include/flashinfer/profiler.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3818752Z #47 873.3 copying ./include/flashinfer/quantization.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3819894Z #47 873.3 copying ./include/flashinfer/sampling.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.3820937Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm 2025-09-07T10:37:04.3822237Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/KernelRunner.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm 2025-09-07T10:37:04.3823789Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:04.3825584Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmEnums.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:04.3827656Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmInterface.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:04.3829720Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmOptions.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:04.3831703Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/Enums.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:04.3833722Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/GemmGatedActOptions.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:04.3835769Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/GemmOptions.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:04.3837795Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelParams.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:04.4175712Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelParamsDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:04.4177859Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelTraits.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:04.4179961Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/TmaDescriptor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:04.4181668Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:04.4183492Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/CommonUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:04.4185893Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/CudaKernelLauncher.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:04.4188142Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/DtypeDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:04.4190531Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/MmaDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:04.4192841Z #47 873.3 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/SfLayoutDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:04.4194600Z #47 873.3 copying ./include/flashinfer/trtllm/common.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm 2025-09-07T10:37:04.4195706Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:04.4197023Z #47 873.3 copying ./include/flashinfer/trtllm/common/cudaBf16Fallbacks.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:04.4198528Z #47 873.3 copying ./include/flashinfer/trtllm/common/cudaBf16Wrapper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:04.4199992Z #47 873.3 copying ./include/flashinfer/trtllm/common/cudaFp8Utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:04.4201444Z #47 873.3 copying ./include/flashinfer/trtllm/common/cudaTypeUtils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:04.4202884Z #47 873.3 copying ./include/flashinfer/trtllm/common/cudaUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:04.4204118Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/cubin 2025-09-07T10:37:04.4205364Z #47 873.3 copying ./include/flashinfer/trtllm/fmha/cubin/kernelMetaInfo.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/cubin 2025-09-07T10:37:04.4206916Z #47 873.3 copying ./include/flashinfer/trtllm/fmha/decoder_impl_common.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:04.4208324Z #47 873.3 copying ./include/flashinfer/trtllm/fmha/decoder_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:04.4209705Z #47 873.3 copying ./include/flashinfer/trtllm/fmha/fmhaKernels.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:04.4211196Z #47 873.3 copying ./include/flashinfer/trtllm/fmha/fmhaRunner.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:04.4212816Z #47 873.3 copying ./include/flashinfer/trtllm/fmha/fmhaRunnerParams.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:04.4214663Z #47 873.3 copying ./include/flashinfer/trtllm/fmha/kernelParams.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:04.4216182Z #47 873.3 copying ./include/flashinfer/trtllm/fmha/lse.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:04.4217312Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:04.4218530Z #47 873.3 copying ./include/flashinfer/trtllm/fused_moe/DevKernel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:04.4220013Z #47 873.3 copying ./include/flashinfer/trtllm/fused_moe/IntFastDiv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:04.4221523Z #47 873.3 copying ./include/flashinfer/trtllm/fused_moe/RoutingKernel.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:04.4223119Z #47 873.3 copying ./include/flashinfer/trtllm/fused_moe/RoutingKernel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:04.4224751Z #47 873.3 copying ./include/flashinfer/trtllm/fused_moe/RoutingKernelTopK.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:04.4226215Z #47 873.3 copying ./include/flashinfer/trtllm/fused_moe/runner.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:04.4227481Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:04.4228964Z #47 873.3 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/Enums.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:04.4230799Z #47 873.3 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/GemmInterface.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:04.4232663Z #47 873.3 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/GemmOptions.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:04.4234508Z #47 873.3 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/KernelParams.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:04.4236367Z #47 873.3 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/KernelTraits.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:04.4238275Z #47 873.3 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/TmaDescriptor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:04.4239829Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:04.4241537Z #47 873.3 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/CommonUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:04.4243691Z #47 873.3 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/CudaKernelLauncher.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:04.4245820Z #47 873.3 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/DtypeDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:04.4247876Z #47 873.3 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/MmaDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:04.4250384Z #47 873.3 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/SfLayoutDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:04.4252062Z #47 873.3 copying ./include/flashinfer/utils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.4253170Z #47 873.3 copying ./include/flashinfer/vec_dtypes.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T10:37:04.4254083Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4254902Z #47 873.3 copying ./tvm_binding/batch_decode.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4256035Z #47 873.3 copying ./tvm_binding/batch_decode_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4257151Z #47 873.3 copying ./tvm_binding/batch_decode_jit_tvm_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4258211Z #47 873.3 copying ./tvm_binding/batch_mla_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4259267Z #47 873.3 copying ./tvm_binding/batch_mla_jit_tvm_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4260263Z #47 873.3 copying ./tvm_binding/batch_mla_plan.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4261270Z #47 873.3 copying ./tvm_binding/batch_mla_run.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4262234Z #47 873.3 copying ./tvm_binding/batch_prefill.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4263397Z #47 873.3 copying ./tvm_binding/batch_prefill_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4264510Z #47 873.3 copying ./tvm_binding/batch_prefill_jit_tvm_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4265524Z #47 873.3 copying ./tvm_binding/batch_prefill_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4266607Z #47 873.3 copying ./tvm_binding/batch_prefill_sm90_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4267752Z #47 873.3 copying ./tvm_binding/batch_prefill_sm90_jit_tvm_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4268808Z #47 873.3 copying ./tvm_binding/sampling.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4269785Z #47 873.3 copying ./tvm_binding/sampling_jit_tvm_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4270775Z #47 873.3 copying ./tvm_binding/tvm_binding_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T10:37:04.4271650Z #47 873.3 copying ./version.txt -> build/lib.linux-x86_64-cpython-312/flashinfer/data 2025-09-07T10:37:04.4272340Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/logging 2025-09-07T10:37:04.4273236Z #47 873.3 copying build/aot-ops-package-dir/logging/logging.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/logging 2025-09-07T10:37:04.4274759Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4277982Z #47 873.3 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4281236Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4284633Z #47 873.3 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4287931Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.4291000Z #47 873.3 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.4294345Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.4297658Z #47 873.3 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.4300938Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4304388Z #47 873.3 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4307668Z #47 873.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4311381Z #47 873.3 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4314702Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.4317721Z #47 873.4 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.4320816Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.4324120Z #47 873.4 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.4327267Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4330495Z #47 873.4 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4334171Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4337752Z #47 873.4 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.4341195Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5256657Z #47 873.4 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5260016Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5263666Z #47 873.4 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5267020Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5270288Z #47 873.4 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5273492Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5276976Z #47 873.4 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5280243Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5283216Z #47 873.4 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5286238Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5289367Z #47 873.4 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5292866Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5296285Z #47 873.4 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5299725Z #47 873.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5303272Z #47 873.4 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5306750Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5309829Z #47 873.5 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5312901Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5316112Z #47 873.5 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.5319653Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5323036Z #47 873.5 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5326477Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5330101Z #47 873.5 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.5333791Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6275537Z #47 873.5 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6278961Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6282367Z #47 873.5 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6285610Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.6288943Z #47 873.5 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.6292650Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.6296276Z #47 873.5 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.6299882Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6303080Z #47 873.5 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6306403Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6309728Z #47 873.5 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6313035Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.6316387Z #47 873.5 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.6319746Z #47 873.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.6323231Z #47 873.5 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:04.6326633Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6329775Z #47 873.6 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6333204Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6336640Z #47 873.6 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:04.6340079Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.6343823Z #47 873.6 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.6347322Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.6351316Z #47 873.6 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.6354985Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.6358583Z #47 873.6 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.6362462Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.6366101Z #47 873.6 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.6369707Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.6373558Z #47 873.6 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.6377268Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.7279214Z #47 873.6 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:04.7282837Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T10:37:04.7285875Z #47 873.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T10:37:04.7288885Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T10:37:04.7292243Z #47 873.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T10:37:04.7295500Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T10:37:04.7298685Z #47 873.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T10:37:04.7301929Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T10:37:04.7305253Z #47 873.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T10:37:04.7308078Z #47 873.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T10:37:04.7310932Z #47 873.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T10:37:04.7313750Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T10:37:04.7316587Z #47 873.7 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T10:37:04.7319419Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T10:37:04.7322309Z #47 873.7 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T10:37:04.7325220Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T10:37:04.7328398Z #47 873.7 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T10:37:04.7331045Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/fmha_cutlass_sm100a 2025-09-07T10:37:04.7332327Z #47 873.7 copying build/aot-ops-package-dir/fmha_cutlass_sm100a/fmha_cutlass_sm100a.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/fmha_cutlass_sm100a 2025-09-07T10:37:04.7333919Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T10:37:04.7336775Z #47 873.7 copying build/aot-ops-package-dir/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T10:37:04.7339624Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T10:37:04.7342515Z #47 873.7 copying build/aot-ops-package-dir/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T10:37:04.7345413Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T10:37:04.7348221Z #47 873.7 copying build/aot-ops-package-dir/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T10:37:04.7351423Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T10:37:04.7354406Z #47 873.7 copying build/aot-ops-package-dir/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T10:37:04.7356880Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/mla 2025-09-07T10:37:04.7357719Z #47 873.7 copying build/aot-ops-package-dir/mla/mla.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/mla 2025-09-07T10:37:04.7358589Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/cascade 2025-09-07T10:37:04.7359521Z #47 873.7 copying build/aot-ops-package-dir/cascade/cascade.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/cascade 2025-09-07T10:37:04.7360472Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/norm 2025-09-07T10:37:04.7361346Z #47 873.7 copying build/aot-ops-package-dir/norm/norm.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/norm 2025-09-07T10:37:04.7362240Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/page 2025-09-07T10:37:04.7363013Z #47 873.7 copying build/aot-ops-package-dir/page/page.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/page 2025-09-07T10:37:04.7363813Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/quantization 2025-09-07T10:37:04.7364729Z #47 873.7 copying build/aot-ops-package-dir/quantization/quantization.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/quantization 2025-09-07T10:37:04.7365626Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/rope 2025-09-07T10:37:04.7366385Z #47 873.7 copying build/aot-ops-package-dir/rope/rope.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/rope 2025-09-07T10:37:04.7367211Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/sampling 2025-09-07T10:37:04.8276650Z #47 873.7 copying build/aot-ops-package-dir/sampling/sampling.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/sampling 2025-09-07T10:37:04.8277729Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/trtllm_utils 2025-09-07T10:37:04.8278903Z #47 873.7 copying build/aot-ops-package-dir/trtllm_utils/trtllm_utils.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/trtllm_utils 2025-09-07T10:37:04.8279986Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8281159Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/axpby.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8282579Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/clear.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8284071Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/cooperative_copy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8285628Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/cooperative_gemm.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8287105Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/copy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8288512Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/fill.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8289967Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/functional.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8291706Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/gemm.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8293180Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/prefer.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8294764Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/prefetch.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8296321Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/tensor_algorithms.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8297894Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/tensor_reduce.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8299507Z #47 873.7 copying 3rdparty/cutlass/include/cute/algorithm/tuple_algorithms.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:04.8300716Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8301851Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/cluster_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8303362Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/cluster_sm90.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8304705Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/config.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8306021Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/copy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8307328Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8308716Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm100_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8310043Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm50.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8311397Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm75.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8312715Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm80.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8314019Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm90.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8315357Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm90_desc.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8316697Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm90_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8318009Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8319299Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8320623Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm100_desc.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8321977Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm100_umma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8323317Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm120.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8324650Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm120_sparse.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8326031Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm61.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8327318Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm70.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8328623Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm75.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8329926Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm80.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8331513Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm89.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8332860Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8334223Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90_desc.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8335600Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90_gmma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8337002Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90_gmma_ext.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8338440Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90_gmma_sparse.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8339926Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90_gmma_sparse_ext.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8341355Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/simd_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8342784Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/tmem_allocator_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8344272Z #47 873.7 copying 3rdparty/cutlass/include/cute/arch/util.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:04.8345309Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8346360Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_atom.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8347690Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8349437Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8350897Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm100_im2col.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8352374Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm100_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8353799Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm50.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8355225Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm75.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8356651Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm80.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8358128Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm90.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8359578Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm90_im2col.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8361047Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm90_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8362593Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm90_tma_swizzle.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8364047Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_atom.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8365360Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8366702Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8368072Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm120.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8369465Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm120_sparse.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8370907Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm61.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8372541Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm70.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8373988Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm75.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8375406Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm80.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8376813Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm89.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8378204Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm90.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8379638Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm90_gmma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8381111Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm90_gmma_ext.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8382604Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8384219Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse_ext.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8385651Z #47 873.7 copying 3rdparty/cutlass/include/cute/atom/partitioner.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:04.8386950Z #47 873.7 copying 3rdparty/cutlass/include/cute/config.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8387979Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:04.8389166Z #47 873.7 copying 3rdparty/cutlass/include/cute/container/alignment.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:04.8390623Z #47 873.7 copying 3rdparty/cutlass/include/cute/container/array.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:04.8392084Z #47 873.7 copying 3rdparty/cutlass/include/cute/container/array_aligned.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:04.8393568Z #47 873.7 copying 3rdparty/cutlass/include/cute/container/array_subbyte.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:04.8395068Z #47 873.7 copying 3rdparty/cutlass/include/cute/container/bit_field.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:04.8396505Z #47 873.7 copying 3rdparty/cutlass/include/cute/container/cuda_types.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:04.8397949Z #47 873.7 copying 3rdparty/cutlass/include/cute/container/tuple.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:04.8399384Z #47 873.7 copying 3rdparty/cutlass/include/cute/container/type_list.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:04.8400706Z #47 873.7 copying 3rdparty/cutlass/include/cute/int_tuple.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8401914Z #47 873.7 copying 3rdparty/cutlass/include/cute/layout.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8403208Z #47 873.7 copying 3rdparty/cutlass/include/cute/layout_composed.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8404268Z #47 873.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:04.8405471Z #47 873.7 copying 3rdparty/cutlass/include/cute/numeric/arithmetic_tuple.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:04.8406911Z #47 873.7 copying 3rdparty/cutlass/include/cute/numeric/complex.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:04.8408281Z #47 873.7 copying 3rdparty/cutlass/include/cute/numeric/int.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:04.8409694Z #47 873.7 copying 3rdparty/cutlass/include/cute/numeric/integer_sequence.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:04.8411237Z #47 873.7 copying 3rdparty/cutlass/include/cute/numeric/integral_constant.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:04.8412944Z #47 873.7 copying 3rdparty/cutlass/include/cute/numeric/integral_ratio.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:04.8414403Z #47 873.7 copying 3rdparty/cutlass/include/cute/numeric/math.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:04.8415840Z #47 873.7 copying 3rdparty/cutlass/include/cute/numeric/numeric_types.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:04.8417288Z #47 873.7 copying 3rdparty/cutlass/include/cute/numeric/real.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:04.8418619Z #47 873.7 copying 3rdparty/cutlass/include/cute/pointer.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8419886Z #47 873.7 copying 3rdparty/cutlass/include/cute/pointer_base.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8421240Z #47 873.8 copying 3rdparty/cutlass/include/cute/pointer_flagged.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8422556Z #47 873.8 copying 3rdparty/cutlass/include/cute/pointer_sparse.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8423969Z #47 873.8 copying 3rdparty/cutlass/include/cute/pointer_swizzle.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8425219Z #47 873.8 copying 3rdparty/cutlass/include/cute/stride.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8426448Z #47 873.8 copying 3rdparty/cutlass/include/cute/swizzle.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8427692Z #47 873.8 copying 3rdparty/cutlass/include/cute/swizzle_layout.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8428929Z #47 873.8 copying 3rdparty/cutlass/include/cute/tensor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8430141Z #47 873.8 copying 3rdparty/cutlass/include/cute/tensor_impl.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8431372Z #47 873.8 copying 3rdparty/cutlass/include/cute/tensor_zip.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8432602Z #47 873.8 copying 3rdparty/cutlass/include/cute/underscore.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:04.8433629Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:04.8434713Z #47 873.8 copying 3rdparty/cutlass/include/cute/util/debug.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:04.8435995Z #47 873.8 copying 3rdparty/cutlass/include/cute/util/print.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:04.8437347Z #47 873.8 copying 3rdparty/cutlass/include/cute/util/print_latex.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:04.8438685Z #47 873.8 copying 3rdparty/cutlass/include/cute/util/print_svg.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:04.8440010Z #47 873.8 copying 3rdparty/cutlass/include/cute/util/print_tensor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:04.8441367Z #47 873.8 copying 3rdparty/cutlass/include/cute/util/type_traits.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:04.8442417Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8443471Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/aligned_buffer.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8444553Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8445637Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/arch.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8446998Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/barrier.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8462207Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/cache_operation.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8463928Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/config.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8465383Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/grid_dependency_control.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8466967Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/memory.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8468343Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/memory_sm75.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8469749Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/memory_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8471116Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8472513Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm100.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8473884Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm50.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8475251Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm60.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8476654Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm61.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8477995Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8479355Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm75.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8480764Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8482113Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm89.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8483524Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm90.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8484914Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma_sparse_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8486328Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/mma_sparse_sm89.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8487748Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/reg_reconfig.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8489109Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/simd.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8490477Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/simd_sm60.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8492177Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/simd_sm61.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8493605Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/synclog.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8495027Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/wmma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8496433Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/wmma_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8497877Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/wmma_sm72.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8499296Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/arch/wmma_sm75.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:04.8500637Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/array.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8501959Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/array_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8503438Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/array_subbyte.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8504734Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/barrier.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8505999Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/bfloat16.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8507253Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/blas3.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8508485Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/blas3_types.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8509778Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/block_striped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8511098Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/cluster_launch.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8512406Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8513678Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/constants.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8514873Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T10:37:04.8516387Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/collective/builders/sm100_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T10:37:04.8518354Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/collective/builders/sm100_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T10:37:04.8520301Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/collective/builders/sm90_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T10:37:04.8522245Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/collective/builders/sm90_gmma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T10:37:04.8524128Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/collective/collective_builder.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T10:37:04.8525896Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/collective/collective_conv.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T10:37:04.8527607Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/collective/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T10:37:04.8529442Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/collective/sm100_implicit_gemm_umma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T10:37:04.8531529Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/collective/sm90_implicit_gemm_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T10:37:04.8533497Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/conv2d_problem_size.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:04.8535023Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/conv3d_problem_size.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:04.8536551Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/convnd_problem_shape.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:04.8538122Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:04.8539572Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:04.8540753Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T10:37:04.8542102Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/device/conv_universal_adapter.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T10:37:04.8543922Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/device/direct_convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T10:37:04.8545634Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/device/implicit_gemm_convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T10:37:04.8547383Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/device/implicit_gemm_convolution_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T10:37:04.8549420Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/dispatch_policy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:04.8550657Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8551961Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/conv_universal.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8553629Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8555308Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_dgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8557007Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8558728Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8560518Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8562429Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8564217Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8566008Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_group_fprop.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8567692Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8569366Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8571165Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv3d_dgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8573025Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8574771Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8576586Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8578356Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv3d_wgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8580114Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_deconv2d.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8581868Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_deconv2d_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8583739Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_deconv3d.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8585439Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_deconv3d_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8587174Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_depthwise_fprop.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8588839Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/direct_convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8590539Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8592297Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8594105Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_strided_dgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8595943Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8597816Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_fused_epilogue.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8599721Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/sm100_implicit_gemm_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8601582Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/kernel/sm90_implicit_gemm_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:04.8602970Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/thread 2025-09-07T10:37:04.8604221Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/thread/depthwise_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/thread 2025-09-07T10:37:04.8605544Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8607067Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8609124Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8611288Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8613681Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8615913Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8618130Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_few_channels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8620346Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_fixed_channels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8622534Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8624752Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8626825Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_few_channels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8628906Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_fixed_channels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8630991Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8632891Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8634672Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8636612Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8638716Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8640862Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8643013Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8645116Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8647165Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8649684Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8651971Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8654211Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8656390Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8658545Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8660672Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8662730Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8664638Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8666744Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8668858Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8671046Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8673058Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_direct_conv_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8675161Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_fixed_stride_dilation.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8677532Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8679661Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_direct_conv_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8681754Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_filter_tile_access_iterator_direct_conv_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8683803Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8685659Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_mma_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8687521Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_mma_core_with_lane_access_size.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8689511Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/implicit_gemm_fprop_fusion_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8691635Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/implicit_gemm_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8693522Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/implicit_gemm_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8695482Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/implicit_gemm_wgrad_fusion_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8697547Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8699592Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8701542Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/threadblock_swizzle.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:04.8702938Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T10:37:04.8704295Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/warp/mma_depthwise_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T10:37:04.8705983Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/warp/mma_depthwise_simt_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T10:37:04.8707659Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/conv/warp/scale_bias_relu_transform.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T10:37:04.8709078Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/coord.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8710351Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/core_io.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8711630Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/cuda_host_adapter.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8712934Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/cutlass.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8713996Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:04.8715220Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/detail/blockwise_scale_layout.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:04.8716747Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/detail/cluster.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:04.8718231Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/detail/collective.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:04.8719495Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/collective 2025-09-07T10:37:04.8720900Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/detail/collective/mixed_input_utils.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/collective 2025-09-07T10:37:04.8722600Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/detail/dependent_false.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:04.8724112Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/detail/helper_macros.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:04.8725581Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/detail/layout.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:04.8727147Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/detail/mainloop_fusion_helper_scale_factor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:04.8728697Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/detail/mma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:04.8730194Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/detail/sm100_blockscaled_layout.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:04.8731990Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/detail/sm100_tmem_helper.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:04.8733456Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/device_kernel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.8734718Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:04.8736359Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/builders/sm100_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:04.8738521Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/builders/sm120_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:04.8740637Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/builders/sm120_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:04.8742732Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/builders/sm90_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:04.8744928Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/builders/sm90_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:04.8746906Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/collective_builder.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.8748955Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/collective_epilogue.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.8751073Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/default_epilogue.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.8753035Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/default_epilogue_array.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.8754990Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.8756967Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/epilogue_tensor_broadcast.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.8758993Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_nosmem.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.8761064Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.8763184Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_nosmem.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.8765168Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.8767147Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.8769109Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized_array.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.9279109Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_array_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.9281274Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.9283554Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized_bias_elementwise.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:04.9285439Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/dispatch_policy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue 2025-09-07T10:37:04.9286727Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9288143Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/callbacks.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9289852Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/operations.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9292005Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm100_callbacks_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9294033Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_compute_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9296064Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_store_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9298146Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm120_callbacks_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9300168Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm120_visitor_store_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9302220Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9304324Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_compute_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9306289Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_load_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9308232Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_store_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9310168Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9312032Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_topk_softmax.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:04.9313417Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9314761Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/activation.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9316454Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/conversion_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9318168Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9319871Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9321696Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_elementwise.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9323610Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_relu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9325432Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_clamp.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9327235Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_dgelu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9329038Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_drelu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9330837Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_gelu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9332955Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_generic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9334917Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_generic_with_scaling.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9336919Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_hardswish.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9338817Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_leaky_relu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9340712Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9342637Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9344607Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_relu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9346405Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_relu0.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9348254Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_residual_block.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9350494Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_sigmoid.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9352378Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_silu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9354392Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_tensor_broadcast.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9356371Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_with_elementwise.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9358236Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/reduction_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9360264Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/scale_type.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:04.9361746Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9363288Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9365344Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op_blas3.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9367362Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_direct_store.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9369403Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9371646Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9373628Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9375670Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op_blas3.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9377739Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_volta_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9379778Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9381842Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9384010Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9386000Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_wmma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9387956Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9389932Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9391910Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_volta_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9393898Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_wmma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9395943Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/direct_store_epilogue_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9397834Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9399654Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9401534Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_base_streamk.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9403427Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_depthwise.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9405365Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_direct_store.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9407328Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_gemm_k_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9409260Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9411266Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_smem_accumulator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9413492Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_streamk_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9415548Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_visitor_with_softmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9417561Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9419540Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9421530Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9423625Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9425633Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor_callbacks.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9427580Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_workspace.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9429095Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:04.9430691Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_2x.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:04.9432701Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_compute.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:04.9434749Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_load.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:04.9436770Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_store.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:04.9438772Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/fusion/visitors.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:04.9440770Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/interleaved_epilogue.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9442750Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/output_iterator_parameter.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9444666Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/output_tile_thread_map.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9446606Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9448595Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9451182Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine_layout_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9453346Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_blas3.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9455416Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_conv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9457493Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_direct_conv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9459589Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9461750Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_predicates.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9463942Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_strided_dgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9465926Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9467893Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_mixed.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9469860Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_pitch_linear.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:04.9471333Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9472735Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9474614Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_gaussian_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9476494Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9478249Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9480086Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_volta_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9481917Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_wmma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9483637Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/simt_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9485268Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tensor_op_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9486936Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tile_iterator_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9488631Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9490359Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op_mixed.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9492377Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tile_iterator_volta_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9494196Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tile_iterator_wmma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9496016Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/volta_tensor_op_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9497771Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/wmma_tensor_op_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:04.9499293Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/exmy_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.9500529Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T10:37:04.9502231Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/device/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T10:37:04.9504527Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/device/dist_gemm_universal_wrapper.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T10:37:04.9506708Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/device/full_barrier.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T10:37:04.9508337Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T10:37:04.9509941Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/kernel/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T10:37:04.9512116Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/kernel/dist_gemm_kernel_wrapper.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T10:37:04.9514311Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/kernel/full_barrier.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T10:37:04.9515948Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T10:37:04.9517668Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_1d_schedules.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T10:37:04.9519952Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_base_schedule.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T10:37:04.9521707Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/fast_math.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.9522952Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/float8.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.9524223Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/float_subbyte.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.9525545Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/floating_point_nvrtc.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.9526883Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/functional.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:04.9528068Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9529650Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_9xBF16_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9532001Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_sparse_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9534210Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9536390Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_blockwise_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9538438Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9540472Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_pipeline_carveout.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9542536Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_simt_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9544676Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_sparse_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9546703Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9548941Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_mma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9551290Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_sparse_mma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9553450Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_blockwise_mma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9555486Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9557492Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_mma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9559551Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_sparse_mma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9561676Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm1xx_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9563636Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm1xx_sparse_config.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9565581Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm90_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9567562Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm90_gmma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9569533Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_config.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9571753Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_gmma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:04.9573768Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/collective_builder.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9575634Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/collective_builder_decl.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9577469Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/collective_mma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9579269Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/collective_mma_decl.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9581083Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/fp8_accumulation.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9583057Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9585187Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9587173Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_sparse_mma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9589123Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9591107Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9593135Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_emulated.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9595051Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9596970Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9598950Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_emulated.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9600874Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_sparse_mma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9602786Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_array_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9604632Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9606486Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_sparse_mma_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9608417Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_mma_array_tma_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9610238Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_mma_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9612279Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_mma_tma_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9614148Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_sparse_mma_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9615962Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm70_mma_twostage.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9617851Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm80_mma_array_multistage.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9619722Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm80_mma_multistage.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9621705Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9623895Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9625874Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9627966Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9630059Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_rs_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9632045Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9634068Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9636017Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9637928Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9639749Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9641662Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9643722Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9645780Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9647762Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized_fp8.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:04.9649372Z #47 873.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9650835Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/base_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9652654Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/default_gemm_configuration.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9654376Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/ell_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9655931Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9657506Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_array.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9659116Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_batched.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9660730Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9662351Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9664138Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_layernorm_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9665776Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_sparse.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9667384Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_sparse_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9669103Z #47 873.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_sparse_universal_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9670853Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_sparse_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9672522Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_sparse_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9674187Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_splitk_parallel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9675793Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9677469Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9679123Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9680834Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_streamk_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9682589Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9684309Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9686023Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_with_k_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9687632Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9689122Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/rank_2k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9690667Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/rank_2k_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9692467Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/rank_k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9694006Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/symm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9695557Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/device/trmm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:04.9697101Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/dispatch_policy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T10:37:04.9698549Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T10:37:04.9700017Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/gemm_enumerated_types.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T10:37:04.9701600Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/group_array_problem_shape.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T10:37:04.9702866Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9704307Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_ell_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9705887Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9707506Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9709195Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9710911Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_per_group_scale.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9712756Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_softmax_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9714618Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_layernorm_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9716425Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_planar_complex_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9718185Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9719878Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9721693Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9723494Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9725249Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9726989Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_splitk_parallel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9728757Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_streamk_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9730493Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9732465Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_universal_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9734259Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9736020Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9737824Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_with_k_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9739607Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9741309Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9742936Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_2k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9744743Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_2k_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9746412Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_2k_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9748075Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_2k_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9750050Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9751729Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_k_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9753513Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_k_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9755244Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_symm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9756902Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_symm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9758624Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_symm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9760306Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_trmm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9762059Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_trmm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9763733Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_trmm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9765339Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/ell_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9766840Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9768366Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_array.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9769933Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_batched.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9771772Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9773468Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_grouped_per_group_scale.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9775241Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_grouped_problem_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9777050Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_grouped_softmax_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9778936Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_layernorm_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9780645Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9782266Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9784029Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9785696Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex_array.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9787393Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9789142Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9790843Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_splitk_parallel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9792547Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_streamk_with_fused_epilogue.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9794270Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_transpose_operands.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9795905Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9797524Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9799158Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal_decl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9800798Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal_streamk.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9802485Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9804238Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor_streamk.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9805951Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9807587Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_with_fused_epilogue.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9809250Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_with_k_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9810841Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9812688Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemv_batched_strided.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9814422Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/grouped_problem_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9816124Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/params_sparse_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9817830Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/params_universal_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9819513Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9821253Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped_problem_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9823058Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/rank_2k_transpose_operands.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9824838Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/rank_2k_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9826438Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/rank_k_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9828165Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9830065Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_input_transform.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9832032Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_mma_transform.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9833891Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9835745Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_input_transform.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9837653Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_mma_transform.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9839566Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_sparse_gemm_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9841356Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_static_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9843049Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9844747Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_group.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9846505Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_stream_k.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9848401Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm120_gemm_tma_warpspecialized_cooperative_asymmetric_dma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9850598Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm70_gemm.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9852287Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm70_gemm_array.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9854187Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_cooperative.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9856175Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_pingpong.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9858011Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9859746Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9861627Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9863630Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9865430Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9867231Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_cooperative.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9869065Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_pingpong.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9870810Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9872502Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_group.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9874273Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_stream_k.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9875927Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sparse_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9877548Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9879209Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9880938Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/static_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9882590Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/symm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9884186Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9885834Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/tile_scheduler_detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9887509Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/tile_scheduler_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9889157Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/trmm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:04.9890415Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T10:37:04.9891882Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/thread/mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T10:37:04.9893434Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/thread/mma_sm50.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T10:37:04.9895002Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/thread/mma_sm60.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T10:37:04.9896558Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/thread/mma_sm61.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T10:37:04.9897844Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9899258Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_ell_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9901046Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_gemv_core.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9902832Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9904679Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9906423Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9908238Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9910020Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm75.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9911788Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9913630Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sparse_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9915493Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_access_size.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9917367Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9919196Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_wmma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9921058Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_layernorm_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9923027Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9925007Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9926946Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_softmax_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9928819Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9930690Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9932867Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9934883Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9936882Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_multistage_trmm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9938780Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_sparse_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9940574Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_trmm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9942409Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/ell_mma_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9944298Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/ell_mma_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9945981Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/gemv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9947648Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/index_remat.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9949713Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9951484Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_blas3_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9953427Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_layernorm_mainloop_fusion_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9955325Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9957111Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9958985Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9960904Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9962902Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9964703Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_singlestage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9966539Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_softmax_mainloop_fusion_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9968381Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_sparse_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9970138Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_sparse_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9972204Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_with_reduction_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9974103Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9976010Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle_streamk.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:04.9977481Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9978818Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9980547Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_sparse_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9982222Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9984031Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9985722Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_with_reduction_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9987401Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_wmma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9989078Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/layernorm_scale_bias_transform.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9990654Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9992185Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9993806Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_fast_f32.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9995558Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_tile_iterator_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9997267Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:04.9999026Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op_tile_iterator_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0000758Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_mixed_input_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0002343Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0284219Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0285829Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_simt_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0287417Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_simt_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0289006Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_sparse_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0290707Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0292591Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fast_f32.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0294271Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fragment_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0295964Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0297666Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0299347Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0301085Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0302901Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0304607Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0306370Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sparse.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0308075Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_wmma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0309734Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_wmma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0311345Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_with_reduction_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0312986Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/scale_bias_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0314638Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/softmax_scale_bias_transform.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0316328Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/tile_iterator_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.0317787Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm_coord.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0319063Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/gemm_coord.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0320316Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/half.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0321572Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/integer_subbyte.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0322916Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/kernel_hardware_info.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0324335Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/kernel_hardware_info.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0325664Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/kernel_launch.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0326744Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.0327875Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/layout/layout.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.0329319Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/layout/matrix.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.0330741Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/layout/permute.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.0332460Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/layout/pitch_linear.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.0333948Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/layout/tensor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.0335498Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.0337136Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm75.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.0338822Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.0340377Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/layout/vector.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.0341761Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/matrix.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0343059Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/matrix_coord.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0344457Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/matrix_shape.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0345783Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/numeric_conversion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0347105Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/numeric_size.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0348387Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/numeric_types.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0349859Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T10:37:05.0351122Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/pipeline/pipeline.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T10:37:05.0352709Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/pipeline/sm100_pipeline.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T10:37:05.0354327Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/pipeline/sm90_pipeline.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T10:37:05.0355824Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/pitch_linear_coord.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0357022Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/platform 2025-09-07T10:37:05.0358253Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/platform/platform.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/platform 2025-09-07T10:37:05.0359699Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/predicate_vector.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0361063Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/quaternion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0362481Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/real.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0363646Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T10:37:05.0364961Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/reduction/device/reduce_split_k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T10:37:05.0366632Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/reduction/device/tensor_reduce.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T10:37:05.0368379Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/reduction/device/tensor_reduce_affine_contiguous.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T10:37:05.0370207Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/reduction/device/tensor_reduce_affine_strided.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T10:37:05.0371866Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T10:37:05.0373307Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/reduction/kernel/reduce_softmax_final.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T10:37:05.0375162Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/reduction/kernel/reduce_split_k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T10:37:05.0377022Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/reduction/kernel/tensor_reduce_affine_contiguous.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T10:37:05.0378959Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/reduction/kernel/tensor_reduce_affine_strided.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T10:37:05.0380427Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T10:37:05.0381779Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/reduction/thread/reduce.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T10:37:05.0383793Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/reduction/thread/reduction_operators.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T10:37:05.0385371Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/reduction/threadblock_swizzle.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction 2025-09-07T10:37:05.0386724Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/relatively_equal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0387934Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/semaphore.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0389144Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/subbyte_reference.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0390370Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/tensor_coord.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0391539Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/tensor_ref.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0392744Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/tensor_ref_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0393974Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/tensor_view.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0395236Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/tensor_view_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0396457Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/tfloat32.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0397431Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/thread 2025-09-07T10:37:05.0398478Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/thread/matrix.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/thread 2025-09-07T10:37:05.0399674Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/trace.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0400706Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/collective 2025-09-07T10:37:05.0402097Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/collective/sm90_wgmma_transpose.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/collective 2025-09-07T10:37:05.0403429Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/device 2025-09-07T10:37:05.0404781Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/device/transform_universal_adapter.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/device 2025-09-07T10:37:05.0406096Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T10:37:05.0407405Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/kernel/filter_format_transformer.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T10:37:05.0409122Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/kernel/sm90_sparse_gemm_compressor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T10:37:05.0410808Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/kernel/sparse_gemm_compressor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T10:37:05.0412777Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/pitch_linear_thread_map.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform 2025-09-07T10:37:05.0414125Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T10:37:05.0415503Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/thread/transpose.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T10:37:05.0417251Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/thread/unary_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T10:37:05.0418636Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0420164Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/ell_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0422173Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/ell_predicated_tile_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0424408Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/ell_predicated_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0426532Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_scale_bias_vector_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0428506Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_scale_bias_vector_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0430383Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0432307Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_2dthreadtile.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0434259Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0436260Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_triangular_matrix.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0438193Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0440058Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator_2dthreadtile.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0442004Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator_triangular_matrix.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0443917Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_vector_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0445829Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_scale_bias_vector_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0447713Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0449995Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_pitch_linear.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0452307Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_pitch_linear_direct_conv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0454609Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0456759Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_tensor_op_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0458852Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0460946Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_pitch_linear.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0463202Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_pitch_linear_2dthreadtile.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0465298Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0467319Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_tensor_op_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0469263Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/vector_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.0470717Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/warp 2025-09-07T10:37:05.0472087Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/transform/warp/vector_fragment_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/warp 2025-09-07T10:37:05.0473623Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/uint128.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0474867Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/version.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0476114Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/wmma_array.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0477377Z #47 873.9 copying 3rdparty/cutlass/include/cutlass/workspace.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.0478497Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0479808Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/GPU_Clock.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0481481Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/command_line.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0483194Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/cublas_wrappers.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0484861Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/debug.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0486505Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_dump.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0488234Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_groupnorm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0489948Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_layernorm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0491889Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_memory.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0493656Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_nchw_to_nhwc.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0495471Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_nhwc_padding.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0497267Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_nhwc_pooling.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0499050Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_nhwc_to_nchw.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0500807Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_rmsnorm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0502535Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0504391Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/distribution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0506085Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/exceptions.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0507825Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/gett_commandline.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0509536Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/helper_cuda.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0511199Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/host_reorder.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0512856Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/host_tensor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0514586Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/host_tensor_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0516321Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/host_uncompress.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0518014Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/index_sequence.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0519729Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0521448Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/packed_stride.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0523444Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/print_error.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0524854Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T10:37:05.0526491Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/detail/inner_product.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T10:37:05.0528639Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/detail/linear_to_coordinate.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T10:37:05.0530349Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0532250Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0534390Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0536501Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/gemm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0538736Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/gemm_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0540909Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/gett.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0542624Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T10:37:05.0544484Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/kernel/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T10:37:05.0546773Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/kernel/tensor_elementwise.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T10:37:05.0549256Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/kernel/tensor_foreach.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T10:37:05.0551721Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/rank_2k_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0553913Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/tensor_compare.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0556071Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0558253Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/tensor_foreach.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0560504Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/tensor_reduce.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0562754Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/tensor_relu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.0564412Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/thread 2025-09-07T10:37:05.0566228Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/thread/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/thread 2025-09-07T10:37:05.0567901Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0569481Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/conv.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0571748Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0573885Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/error_metrics.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0576024Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0578099Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/gemm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0580283Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/gemm_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0582417Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/gett.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0584539Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/rank_2k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0586571Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/rank_2k_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0588634Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/rank_k_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0590636Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/symm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0592648Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/symm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0594722Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_compare.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0596828Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_compare.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0598906Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_copy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0600986Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_elementwise.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0603094Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_fill.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0605144Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_fill.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0607222Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_foreach.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0609267Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_norm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0611542Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_reduce.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0613727Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_reduce.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0615849Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/trmm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0617929Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/trmm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.0619854Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/tensor_view_io.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0621571Z #47 873.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/type_traits.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.0622818Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0623920Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/async.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0625143Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/async_logger-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0626406Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/async_logger.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0627426Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T10:37:05.0628452Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/cfg/argv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T10:37:05.0629701Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/cfg/env.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T10:37:05.0631020Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/cfg/helpers-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T10:37:05.0632336Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/cfg/helpers.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T10:37:05.0633606Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/common-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0634811Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/common.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0635857Z #47 873.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0637021Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/details/backtracer-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0638494Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/details/backtracer.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0639934Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/details/circular_q.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0641374Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/details/console_globals.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0642842Z #47 873.9 copying 3rdparty/spdlog/include/spdlog/details/file_helper-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0644293Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/file_helper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0645738Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/fmt_helper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0647198Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/log_msg-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0648596Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/log_msg.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0650363Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/log_msg_buffer-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0651933Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/log_msg_buffer.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0653429Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/mpmc_blocking_q.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0654917Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/null_mutex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0656488Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/os-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0657953Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/os.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0659709Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/periodic_worker-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0661341Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/periodic_worker.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0663054Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/registry-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0664753Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/registry.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0666295Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/synchronous_factory.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0667911Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/tcp_client-windows.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0669540Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/tcp_client.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0671100Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/thread_pool-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0672643Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/thread_pool.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0689417Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/udp_client-windows.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0691157Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/udp_client.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0692825Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/details/windows_include.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.0694000Z #47 874.0 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.0695195Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bin_to_hex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.0696329Z #47 874.0 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0697576Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/args.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0699093Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/chrono.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0700609Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/color.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0702127Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/compile.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0703744Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/core.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0705238Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/fmt.license.rst -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0706766Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/format-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0708261Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/format.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0709729Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/locale.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0711177Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/os.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0712667Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/ostream.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0714147Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/printf.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0715626Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/ranges.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0717080Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/std.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0718558Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/xchar.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.0719933Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/chrono.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.0721228Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/compile.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.0722501Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/fmt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.0723745Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/ostr.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.0725003Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/ranges.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.0726269Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/std.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.0727556Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fmt/xchar.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.0728802Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/formatter.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0730023Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/fwd.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0731290Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/logger-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0732714Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/logger.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0733923Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/mdc.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0735209Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/pattern_formatter-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0736584Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/pattern_formatter.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0737684Z #47 874.0 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0738822Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/android_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0740296Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/ansicolor_sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0741769Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/ansicolor_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0743223Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/base_sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0744741Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/base_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0746131Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/basic_file_sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0747553Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/basic_file_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0749121Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/callback_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0750796Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/daily_file_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0752218Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/dist_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0753647Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/dup_filter_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0755094Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/hourly_file_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0756541Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/kafka_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0757936Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/mongo_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0759380Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/msvc_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0760782Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/null_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0762330Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/ostream_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0763695Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/qt_sinks.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0765077Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/ringbuffer_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0766527Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/rotating_file_sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0768002Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/rotating_file_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0769404Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0770724Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0772394Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/stdout_color_sinks-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0773887Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/stdout_color_sinks.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0775373Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/stdout_sinks-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0776829Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/stdout_sinks.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0778297Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/syslog_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0779727Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/systemd_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0781135Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/tcp_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0782498Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/udp_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0784059Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/win_eventlog_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0785482Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/wincolor_sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0786906Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/sinks/wincolor_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.0788231Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/spdlog-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0789435Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/spdlog.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0790658Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/stopwatch.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0791922Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/tweakme.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0793117Z #47 874.0 copying 3rdparty/spdlog/include/spdlog/version.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.0793889Z #47 874.0 running build_ext 2025-09-07T10:37:05.0794263Z #47 874.0 installing to build/bdist.linux-x86_64/wheel 2025-09-07T10:37:05.0794658Z #47 874.0 running install 2025-09-07T10:37:05.1282734Z #47 874.0 running install_lib 2025-09-07T10:37:05.1283131Z #47 874.0 creating build/bdist.linux-x86_64/wheel 2025-09-07T10:37:05.1283750Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer 2025-09-07T10:37:05.1284591Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1285678Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/__main__.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1286891Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/activation.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1288204Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/aot.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1289267Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/artifacts.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1290703Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/attention.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1292266Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/autotuner.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1293478Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/cascade.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1294712Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/cuda_utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1296100Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/decode.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1297543Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/deep_gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1298746Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/fp4_quantization.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1300231Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/fp8_quantization.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1301570Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1302738Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/green_ctx.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1304041Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/mla.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1305182Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/norm.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1306198Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/page.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1307373Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/pod.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1308694Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/prefill.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1309780Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/quantization.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1311001Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/rope.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1312542Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/sampling.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1314005Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/sparse.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1315356Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/tllm_utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1316465Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1317683Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/_build_meta.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.1318646Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/cute_dsl 2025-09-07T10:37:05.1319591Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/cute_dsl/blockscaled_gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/cute_dsl 2025-09-07T10:37:05.1321045Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/cute_dsl/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/cute_dsl 2025-09-07T10:37:05.1322033Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data 2025-09-07T10:37:05.1323052Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/custom_backend.py -> build/bdist.linux-x86_64/wheel/./flashinfer/data 2025-09-07T10:37:05.1324510Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/setup.py -> build/bdist.linux-x86_64/wheel/./flashinfer/data 2025-09-07T10:37:05.1325373Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc 2025-09-07T10:37:05.1326306Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/activation.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1327800Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/aot_extension_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1329409Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_attention.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1331503Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_attention_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1333207Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_attention_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1334812Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_attention_paged_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1336590Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1338129Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1340151Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1341976Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1343707Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1345135Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_mla_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1346723Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_mla_cute_sm80.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1348493Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_mla_plan.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1350564Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_mla_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1352131Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_mla_run.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1353648Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1355022Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_plan.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1356492Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1358067Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_run.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1359876Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_sm90_plan.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1361412Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_sm90_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1362894Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_sm90_run.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1364385Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1365945Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1367592Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1369456Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_fp8_paged_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1371218Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_fp8_ragged_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1372980Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_fp8_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1374659Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1376402Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_paged_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1378082Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_paged_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1379709Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_ragged_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1381621Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_ragged_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1383526Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1385073Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_sm90_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1386749Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_sm90_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1388479Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_sm90_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1390088Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/blackwell_fmha_plan.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1391695Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/bmm_fp8.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1392940Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/cascade.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1394269Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/cudnn_sdpa_kernel_launcher.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1395801Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/cudnn_sdpa_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1397354Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/cutlass_mla.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1398707Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_cascade_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1400565Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_gemm_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1402027Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_gemm_sm90_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1403639Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_mla_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1405280Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_norm_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1406720Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1408216Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_ops_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1409655Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_page_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1411249Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_quantization_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1412877Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_rope_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1414596Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_sampling_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1416226Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fmha_cutlass_sm100.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1418127Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fmha_cutlass_sm100_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1419824Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fp4_gemm_cutlass.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1421385Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fp4_gemm_cutlass.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1422889Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fp8_gemm_cutlass.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1424495Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fp8_gemm_cutlass.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1425587Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/fused_moe 2025-09-07T10:37:05.1426494Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T10:37:05.1428038Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend/cutlass_fused_moe_instantiation.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T10:37:05.1430056Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend/cutlass_fused_moe_kernels.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T10:37:05.1432430Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T10:37:05.1434326Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/gemm_groupwise_sm100.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1436061Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/gemm_groupwise_sm100_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1438029Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/gemm_sm100_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1439568Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1441013Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_fp8_groupwise_sm100.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1442772Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_fp8_groupwise_sm100_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1444791Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_mxfp4_groupwise_sm100.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1446489Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_mxfp4_groupwise_sm100_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1448177Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_sm100_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1450014Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1451944Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1453401Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/logging.cc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1454852Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/norm.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1455906Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal 2025-09-07T10:37:05.1456682Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/cpp 2025-09-07T10:37:05.1457540Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:05.1459063Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common/envUtils.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:05.1461130Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common/logger.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:05.1463144Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common/memoryUtils.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:05.1465335Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common/stringUtils.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:05.1467190Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common/tllmException.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T10:37:05.1468685Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/cpp/kernels 2025-09-07T10:37:05.1470227Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/kernels/quantization.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/kernels 2025-09-07T10:37:05.1471670Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/include 2025-09-07T10:37:05.1472635Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/include/tensorrt_llm 2025-09-07T10:37:05.1473727Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:05.1475437Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/NvInferRuntime.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:05.1477902Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/assert.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:05.1480242Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaBf16Wrapper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:05.1482791Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaFp8Utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:05.1485332Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:05.1487600Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/dataType.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:05.1489772Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/logger.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:05.1492503Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/quantization.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:05.1495019Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/stringUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:05.1497566Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/tllmException.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T10:37:05.1499129Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm 2025-09-07T10:37:05.1500095Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:05.1501829Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cublasMMWrapper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:05.1504600Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaBf16Fallbacks.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:05.1506762Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaDriverWrapper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:05.1509257Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaTypeUtils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:05.1511183Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/envUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:05.1513120Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/memoryUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:05.1515066Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/quantTypeUtils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:05.1517411Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:05.1519528Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/workspace.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T10:37:05.1520975Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions 2025-09-07T10:37:05.1522024Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include 2025-09-07T10:37:05.1523199Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:05.1524503Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:05.1526588Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_red_global.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:05.1529481Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_sm90_multimem.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:05.1532743Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_traits_sm90_multimem.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:05.1535769Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/grid_dependency_control.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:05.1538673Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T10:37:05.1540800Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication 2025-09-07T10:37:05.1542340Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective 2025-09-07T10:37:05.1545029Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective/sm90_allreduce_nvls_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective 2025-09-07T10:37:05.1548198Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/compute_occupancy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:05.1550655Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail 2025-09-07T10:37:05.1552119Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective 2025-09-07T10:37:05.1554516Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective/mixed_input_utils.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective 2025-09-07T10:37:05.1556944Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue 2025-09-07T10:37:05.1558434Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective 2025-09-07T10:37:05.1560995Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective/epilogue_moe_finalize.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective 2025-09-07T10:37:05.1563306Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion 2025-09-07T10:37:05.1565585Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_allreduce_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion 2025-09-07T10:37:05.1567804Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread 2025-09-07T10:37:05.1569895Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread/fused_activations.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread 2025-09-07T10:37:05.1572982Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue_helpers.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:05.1575098Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm 2025-09-07T10:37:05.1576518Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:05.1578062Z #47 874.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T10:37:05.1580604Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_gated.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T10:37:05.1584366Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_interleaved.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T10:37:05.1588378Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_mixed_input.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T10:37:05.1591986Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_gated.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:05.1595239Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_interleaved.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:05.1598504Z #47 874.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_mixed_input.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:05.1601932Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_array_mixed_input.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:05.1605051Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_gated.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:05.1607978Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_interleaved.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:05.1611118Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input_.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:05.1614749Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:05.1618203Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized_fp8.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:05.1621794Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_interleaved_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T10:37:05.1624498Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1626666Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/default_fpA_intB_traits.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1629655Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1632819Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_routine.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1635939Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_traits.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1639033Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_moe_problem_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1642198Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_universal_allreduce.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1645204Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/mixed_gemm_B_layout.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1647951Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cute_util.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1651268Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cutlass_kernel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1654389Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_problem_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1657712Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1661107Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized_pingpong.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T10:37:05.1663675Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1665760Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1668633Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1671828Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1674900Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1677899Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma_bf16.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1680872Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1683899Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1686800Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_finegrained.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1689735Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_percol.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1693041Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1696309Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_finegrained.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1699651Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_percol.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T10:37:05.1701992Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T10:37:05.1704429Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/default_mma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T10:37:05.1707649Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_compute_B_with_f16.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T10:37:05.1710724Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_dequantizer.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T10:37:05.1713594Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm_configs.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:05.1716491Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/interleaved_numeric_conversion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:05.1719219Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/system_barrier.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:05.1721898Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/tile_interleaved_layout.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:05.1723937Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform 2025-09-07T10:37:05.1725356Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock 2025-09-07T10:37:05.1727874Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock/fine_grained_scale_zero_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock 2025-09-07T10:37:05.1730034Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util 2025-09-07T10:37:05.1732487Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util/gather_tensor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util 2025-09-07T10:37:05.1735440Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/weight_only_quant_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T10:37:05.1737328Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:05.1738339Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T10:37:05.1740098Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T10:37:05.1742764Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T10:37:05.1745402Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_type_conversion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T10:37:05.1747221Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T10:37:05.1749525Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T10:37:05.1752340Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm_stub.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T10:37:05.1754378Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1756352Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scalebias.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1759072Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scaleonly.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1762204Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_per_col.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1765327Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scalebias.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1767946Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scaleonly.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1770626Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_per_col.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1773614Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_bf16_out_bf16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1776434Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_f16_out_f16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1779287Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_bf16_out_bf16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1782462Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_f16_out_f16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1785294Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_per_col_f16_out_f16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1787935Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scalebias.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1790574Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scaleonly.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1793169Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_per_col.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1795860Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scalebias.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1798399Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scaleonly.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1800959Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_per_col.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1803416Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1805918Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1808439Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template_sm90.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T10:37:05.1810363Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T10:37:05.1812726Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T10:37:05.1815663Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T10:37:05.1817705Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:05.1819519Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/common.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:05.1822045Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/cutlass_kernel_selector.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:05.1824649Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_gemm_kernels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:05.1827001Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_kernels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:05.1829349Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_util_kernels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T10:37:05.1831086Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1832291Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:05.1834245Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:05.1836931Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:05.1839654Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:05.1842297Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:05.1844978Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:05.1847744Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T10:37:05.1850830Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_bf16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1853511Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1856107Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp8.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1858697Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1861320Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint8.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1863989Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1866412Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1869107Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1871705Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint8.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1874289Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp32_fp32.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1876787Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp4_fp4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1879365Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1881909Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp8.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1884450Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_uint4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1885654Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1886878Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1888266Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws_mixed_dtype.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1889475Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_tma_warp_specialized_input.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1890676Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_tma_warp_specialized_traits.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T10:37:05.1891803Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/delayStream.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:05.1892773Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/delayStream.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:05.1893211Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T10:37:05.1894164Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora/lora.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T10:37:05.1895096Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora/lora.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T10:37:05.1896137Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:05.1897118Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:05.1898069Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/quantization.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:05.1898999Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/quantization.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T10:37:05.1899423Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime 2025-09-07T10:37:05.1900358Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime/torchUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime 2025-09-07T10:37:05.1900769Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:05.1901642Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Op.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:05.1902560Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:05.1903562Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:05.1904505Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:05.1905312Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:05.1906081Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/thUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T10:37:05.1906645Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nvshmem_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1907138Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/page.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1907653Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pod.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1908194Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pod_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1908777Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pod_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1909318Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pod_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1909914Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pod_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1910502Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pytorch_conversion_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1911082Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pytorch_extension_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1911638Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/quantization.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1912140Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/renorm.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1912626Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/rope.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1913198Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/runtime_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1913715Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/sampling.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1914276Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_decode.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1914863Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_decode_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1915495Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_decode_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1916074Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_decode_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1916694Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_decode_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1917234Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1917811Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1918457Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1919029Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_fp8_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1919688Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_fp8_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1920275Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1920917Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1921489Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1922283Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_sm90_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1923006Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_sm90_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1923662Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_sm90_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1924328Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1924907Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_allreduce.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1925534Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_allreduce_fusion.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1926106Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_alltoall.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1926764Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_batched_gemm_runner.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1927410Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fmha_kernel_launcher.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1928075Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_dev_kernel.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1928730Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_kernel_launcher.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1929399Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_routing_deepseek.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1930046Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_routing_llama4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1930719Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_routing_renormalize.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1931578Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_runner.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1932255Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_gemm_runner.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1932914Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_mnnvl_allreduce.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1933585Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_moe_allreduce_fusion.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1934228Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/vllm_custom_all_reduce.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T10:37:05.1934530Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include 2025-09-07T10:37:05.1934825Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer 2025-09-07T10:37:05.1935600Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/activation.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.1936365Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/allocator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.1937133Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/arch_condition.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.1937513Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1937934Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/blackwell 2025-09-07T10:37:05.1938409Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:05.1939565Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/fmha_common.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:05.1940720Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/fmha_fusion.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:05.1942062Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_fwd_epilogue_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:05.1943394Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_fwd_mainloop_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:05.1944696Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_gen_epilogue_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:05.1945836Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_gen_mainloop_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:05.1946992Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_load_cpasync_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:05.1948111Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_load_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T10:37:05.1948513Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/blackwell/common 2025-09-07T10:37:05.1949855Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/common/pow_2.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/common 2025-09-07T10:37:05.1950315Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T10:37:05.1951441Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/device/fmha.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T10:37:05.1952533Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/device/sm100_mla.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T10:37:05.1953561Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/fmha_cutlass_sm100.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell 2025-09-07T10:37:05.1954070Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:05.1955169Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/fmha_options.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:05.1956307Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/fmha_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:05.1957431Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/gather_tensor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:05.1958711Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/sm100_fmha_fwd_kernel_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:05.1959941Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/sm100_fmha_gen_kernel_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:05.1961148Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/sm100_fmha_mla_reduction.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:05.1962384Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/sm100_fmha_mla_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:05.1963419Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/sm100_mla_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T10:37:05.1964286Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/plan.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell 2025-09-07T10:37:05.1965064Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/cascade.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1965862Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/cutlass_mla.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1966634Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/decode.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1967465Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/decode_mla_cute_sm80.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1968334Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/default_decode_params.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1969177Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/default_prefill_params.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1969935Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/heap.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1970731Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1971136Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1972334Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/attention_updater.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1973339Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/block_sparse_gather.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1974326Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/default_params.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1975341Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/epilogue.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1976354Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/kernel_traits.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1977327Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/mainloop.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1978298Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/mainloop_mma.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1979272Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/named_barrier.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1980259Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1980734Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:05.1981877Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/epilogue.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:05.1983045Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/kernel_traits.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:05.1984362Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/mainloop_load.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:05.1985424Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/mainloop_mma.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:05.1986484Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/mainloop_sparse_load.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:05.1987533Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/prefill_sm90.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T10:37:05.1988427Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/sparse_mainloop.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1989308Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/tile_scheduler.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1990141Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/utils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1991028Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/variant_helper.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1991926Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/variants.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T10:37:05.1992724Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/mask.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1993472Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/mla.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1994246Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/mla_hopper.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1995041Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/mla_params.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1995834Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/persistent.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1996672Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/persistent_template.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1997432Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/pod.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1998209Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/prefill.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1999000Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/scheduler.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.1999818Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/state.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.2000627Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/variant_helper.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.2001423Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/variants.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T10:37:05.2002207Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention_impl.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2002497Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:05.2003273Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/trtllm_allreduce.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:05.2004060Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/trtllm_allreduce_fusion.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:05.2361878Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/trtllm_alltoall.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:05.2362838Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/trtllm_mnnvl_allreduce.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:05.2363920Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/trtllm_moe_allreduce_fusion.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:05.2364866Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/vllm_custom_all_reduce.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T10:37:05.2365595Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/cp_async.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2366345Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/cubin_loader.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2367107Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/cutlass_utils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2367840Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/exception.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2368585Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/fastdiv.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2369277Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/fp16.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2370007Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/fp4_layout.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2370799Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/frag_layout_swizzle.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2371211Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2372257Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/bmm_fp8.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2373131Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/cutlass_gemm_configs.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2373958Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp4_gemm_cutlass.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2374829Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp4_gemm_cutlass_template.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2375767Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp4_gemm_template_sm100.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2376595Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp8_gemm_cutlass.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2377478Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp8_gemm_cutlass_template.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2378337Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp8_gemm_template_sm100.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2379234Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/gemm_groupwise_sm100.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2380065Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemm.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2381006Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemm_fp8_groupwise_sm100.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2381901Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemm_lora.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2383124Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemm_mxfp4_groupwise_sm100.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2383938Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemm_sm90.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2384743Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemv.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T10:37:05.2385569Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/layout.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2386285Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/logging.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2387001Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/math.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2387707Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/mma.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2388452Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/norm.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2389173Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/page.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2389933Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/permuted_smem.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2390643Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/pos_enc.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2391421Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/profiler.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2392187Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/quantization.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2392919Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/sampling.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2393253Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm 2025-09-07T10:37:05.2393650Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/batched_gemm 2025-09-07T10:37:05.2394610Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/KernelRunner.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm 2025-09-07T10:37:05.2395171Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:05.2396415Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmEnums.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:05.2397645Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmInterface.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:05.2398872Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmOptions.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:05.2400014Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/Enums.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:05.2401251Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/GemmGatedActOptions.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:05.2402518Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/GemmOptions.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:05.2403781Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelParams.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:05.2405050Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelParamsDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:05.2406225Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelTraits.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:05.2407421Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/TmaDescriptor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T10:37:05.2408035Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm 2025-09-07T10:37:05.2408617Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:05.2409929Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/CommonUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:05.2411516Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/CudaKernelLauncher.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:05.2412900Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/DtypeDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:05.2414245Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/MmaDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:05.2415592Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/SfLayoutDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T10:37:05.2416422Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm 2025-09-07T10:37:05.2416803Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:05.2417804Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common/cudaBf16Fallbacks.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:05.2418786Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common/cudaBf16Wrapper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:05.2419729Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common/cudaFp8Utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:05.2420712Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common/cudaTypeUtils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:05.2421633Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common/cudaUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T10:37:05.2422030Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:05.2422440Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/fmha/cubin 2025-09-07T10:37:05.2423580Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/cubin/kernelMetaInfo.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha/cubin 2025-09-07T10:37:05.2424488Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/decoder_impl_common.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:05.2425422Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/decoder_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:05.2426322Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/fmhaKernels.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:05.2427206Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/fmhaRunner.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:05.2428139Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/fmhaRunnerParams.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:05.2429062Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/kernelParams.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:05.2429911Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/lse.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T10:37:05.2430318Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:05.2431217Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/DevKernel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:05.2432134Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/IntFastDiv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:05.2433071Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/RoutingKernel.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:05.2434016Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/RoutingKernel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:05.2434994Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/RoutingKernelTopK.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:05.2435874Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/runner.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T10:37:05.2436237Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/gemm 2025-09-07T10:37:05.2436824Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:05.2437906Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/Enums.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:05.2439014Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/GemmInterface.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:05.2440095Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/GemmOptions.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:05.2441213Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/KernelParams.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:05.2442313Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/KernelTraits.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:05.2443403Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/TmaDescriptor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T10:37:05.2443918Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm 2025-09-07T10:37:05.2444440Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:05.2445671Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/CommonUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:05.2446957Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/CudaKernelLauncher.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:05.2448141Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/DtypeDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:05.2449648Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/MmaDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:05.2451016Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/SfLayoutDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T10:37:05.2451759Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/utils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2452531Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/vec_dtypes.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T10:37:05.2452803Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/tvm_binding 2025-09-07T10:37:05.2453467Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_decode.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2454322Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_decode_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2455054Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_decode_jit_tvm_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2455751Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_mla_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2456476Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_mla_jit_tvm_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2457182Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_mla_plan.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2457839Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_mla_run.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2458521Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2459296Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2460034Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill_jit_tvm_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2460778Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2461574Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill_sm90_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2462388Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill_sm90_jit_tvm_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2463152Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/sampling.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2463788Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/sampling_jit_tvm_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2464406Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/tvm_binding_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T10:37:05.2464874Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/version.txt -> build/bdist.linux-x86_64/wheel/./flashinfer/data 2025-09-07T10:37:05.2465082Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot 2025-09-07T10:37:05.2465330Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/logging 2025-09-07T10:37:05.2465914Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/logging/logging.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/logging 2025-09-07T10:37:05.2466721Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2468860Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2469745Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2471995Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2472805Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.2474823Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.2475622Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.2477762Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.2478570Z #47 874.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2480716Z #47 874.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2481579Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2483842Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2484642Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.2486678Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.2487490Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.2489654Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.2490486Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2493081Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2494072Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2497015Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.2497987Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3422234Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3423622Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3425998Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3426972Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3429448Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3430407Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3433029Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3433856Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3436007Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3436941Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3439326Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3440176Z #47 874.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3442373Z #47 874.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3443224Z #47 874.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3445586Z #47 874.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3446352Z #47 874.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3448369Z #47 874.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3449547Z #47 874.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3452031Z #47 874.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.3453021Z #47 874.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3455536Z #47 874.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3456500Z #47 874.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3459201Z #47 874.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.3460078Z #47 874.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4441494Z #47 874.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4442449Z #47 874.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4444828Z #47 874.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4445917Z #47 874.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.4448340Z #47 874.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.4449753Z #47 874.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.4452478Z #47 874.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.4453435Z #47 874.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4455830Z #47 874.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4456756Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4459283Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4460216Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.4462721Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.4463834Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.4466398Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T10:37:05.4467295Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4469591Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4470522Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4472920Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T10:37:05.4473865Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4476408Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4477399Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4480046Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4481018Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4483556Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4484559Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4487210Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4488184Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4490766Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4491989Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4494667Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T10:37:05.4495584Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T10:37:05.4497853Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T10:37:05.5592061Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T10:37:05.5595769Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T10:37:05.5599493Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T10:37:05.5603031Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T10:37:05.5606511Z #47 874.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T10:37:05.5610036Z #47 874.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T10:37:05.5613880Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T10:37:05.5617580Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T10:37:05.5621195Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T10:37:05.5625374Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T10:37:05.5628844Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T10:37:05.5632383Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T10:37:05.5636008Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T10:37:05.5639599Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T10:37:05.5642456Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/fmha_cutlass_sm100a 2025-09-07T10:37:05.5643712Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/fmha_cutlass_sm100a/fmha_cutlass_sm100a.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/fmha_cutlass_sm100a 2025-09-07T10:37:05.5645516Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T10:37:05.5648637Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T10:37:05.5652296Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T10:37:05.5655610Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T10:37:05.5658870Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T10:37:05.5662125Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T10:37:05.5665467Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T10:37:05.5668795Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T10:37:05.5671325Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/mla 2025-09-07T10:37:05.5672338Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/mla/mla.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/mla 2025-09-07T10:37:05.5673364Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/cascade 2025-09-07T10:37:05.5674457Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/cascade/cascade.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/cascade 2025-09-07T10:37:05.5675538Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/norm 2025-09-07T10:37:05.5676551Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/norm/norm.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/norm 2025-09-07T10:37:05.5677643Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/page 2025-09-07T10:37:05.5678673Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/page/page.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/page 2025-09-07T10:37:05.5679802Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/quantization 2025-09-07T10:37:05.5681050Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/quantization/quantization.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/quantization 2025-09-07T10:37:05.5682240Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/rope 2025-09-07T10:37:05.5683259Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/rope/rope.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/rope 2025-09-07T10:37:05.5684349Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/sampling 2025-09-07T10:37:05.5685558Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/sampling/sampling.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/sampling 2025-09-07T10:37:05.5686723Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/trtllm_utils 2025-09-07T10:37:05.5687846Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/trtllm_utils/trtllm_utils.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/trtllm_utils 2025-09-07T10:37:05.6592539Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass 2025-09-07T10:37:05.6593231Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include 2025-09-07T10:37:05.6594491Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6595289Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6596735Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/axpby.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6598584Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/clear.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6600507Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/cooperative_copy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6602471Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/cooperative_gemm.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6604420Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/copy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6606298Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/fill.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6608170Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/functional.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6610024Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/gemm.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6612185Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/prefer.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6614122Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/prefetch.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6616096Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/tensor_algorithms.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6618095Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/tensor_reduce.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6620089Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/tuple_algorithms.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T10:37:05.6621489Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6622887Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/cluster_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6624796Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/cluster_sm90.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6626540Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/config.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6628252Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6630006Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6631754Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm100_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6633511Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm50.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6635242Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm75.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6636967Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm80.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6638732Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm90.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6640522Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm90_desc.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6642281Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm90_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6644003Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6645712Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6647442Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm100_desc.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6649561Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm100_umma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6651426Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm120.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6653229Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm120_sparse.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6655043Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm61.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6656885Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm70.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6658644Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm75.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6660414Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm80.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6662185Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm89.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6664060Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6665801Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90_desc.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6667563Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6669325Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_ext.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6671177Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_sparse.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6673018Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_sparse_ext.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6674869Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/simd_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6676664Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/tmem_allocator_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6678428Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/util.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T10:37:05.6679653Z #47 874.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6680912Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_atom.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6682656Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6684439Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6686276Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100_im2col.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6688114Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6689961Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm50.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6692002Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm75.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6693838Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm80.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6695686Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6697620Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_im2col.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6699507Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6701428Z #47 874.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_tma_swizzle.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6703419Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_atom.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6705176Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6706944Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6708756Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm120.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6710562Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm120_sparse.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6712382Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm61.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6714162Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm70.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6715929Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm75.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6717711Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm80.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6719488Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm89.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6721255Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6723051Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6724929Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_ext.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6726787Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6728684Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse_ext.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6730578Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/partitioner.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T10:37:05.6732548Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/config.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6733814Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:05.6735245Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/alignment.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:05.6737162Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/array.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:05.6739132Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/array_aligned.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:05.6741137Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/array_subbyte.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:05.6743065Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/bit_field.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:05.6745049Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/cuda_types.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:05.6746908Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/tuple.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:05.6748920Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/type_list.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T10:37:05.6750909Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/int_tuple.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6752584Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/layout.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6754287Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/layout_composed.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6755591Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:05.6757009Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/arithmetic_tuple.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:05.6758996Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/complex.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:05.6760840Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/int.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:05.6762805Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/integer_sequence.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:05.6764759Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/integral_constant.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:05.6766668Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/integral_ratio.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:05.6768503Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/math.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:05.6770318Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/numeric_types.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:05.6772435Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/real.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T10:37:05.6774252Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/pointer.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6775952Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/pointer_base.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6777742Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/pointer_flagged.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6779501Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/pointer_sparse.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6781245Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/pointer_swizzle.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6782961Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/stride.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6784701Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/swizzle.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6786359Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/swizzle_layout.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6788026Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/tensor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6789651Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/tensor_impl.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6791308Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/tensor_zip.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6793001Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/underscore.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T10:37:05.6794202Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:05.6795444Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/debug.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:05.6797151Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/print.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:05.6798902Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/print_latex.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:05.6800655Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/print_svg.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:05.6802416Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/print_tensor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:05.6804172Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/type_traits.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T10:37:05.6805425Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6806730Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/aligned_buffer.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6807998Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6809310Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/arch.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6811132Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/barrier.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6813192Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/cache_operation.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6815077Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/config.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6816983Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/grid_dependency_control.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6818907Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/memory.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6820763Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/memory_sm75.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6822620Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/memory_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6824530Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6826324Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm100.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6828084Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm50.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6829854Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm60.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6831647Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm61.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6833402Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6835171Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm75.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6836935Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6838688Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm89.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6840487Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm90.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6842289Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sparse_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6844141Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sparse_sm89.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6845970Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/reg_reconfig.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6847754Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/simd.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6849838Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/simd_sm60.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6851763Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/simd_sm61.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6853619Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/synclog.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6855442Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/wmma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6857259Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/wmma_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6859092Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/wmma_sm72.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6860989Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/wmma_sm75.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T10:37:05.6862856Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/array.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6864571Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/array_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6866583Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/array_subbyte.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6868286Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/barrier.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6869970Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/bfloat16.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6871620Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/blas3.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6873288Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/blas3_types.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6875044Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/block_striped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6876786Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/cluster_launch.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6878545Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6880233Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/constants.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.6881482Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:05.6882355Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T10:37:05.6883333Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T10:37:05.6885040Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm100_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T10:37:05.6887478Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm100_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T10:37:05.6889833Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm90_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T10:37:05.6892445Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm90_gmma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T10:37:05.6894847Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/collective_builder.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T10:37:05.6897096Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/collective_conv.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T10:37:05.6899259Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T10:37:05.6901592Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/sm100_implicit_gemm_umma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T10:37:05.6904151Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/sm90_implicit_gemm_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T10:37:05.6906282Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/conv2d_problem_size.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:05.6908179Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/conv3d_problem_size.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:05.6910124Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/convnd_problem_shape.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:05.6912011Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:05.6913872Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:05.6915221Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T10:37:05.6916706Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device/conv_universal_adapter.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T10:37:05.6918799Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device/direct_convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T10:37:05.6920899Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device/implicit_gemm_convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T10:37:05.6923039Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device/implicit_gemm_convolution_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T10:37:05.6925088Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/dispatch_policy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T10:37:05.6926488Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6927938Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/conv_universal.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6930003Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6932295Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_dgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6934407Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6936625Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6938841Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6941080Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6943468Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6945661Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_group_fprop.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6947771Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6950209Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6952368Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_dgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6954504Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6956654Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6958886Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6961079Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_wgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6963244Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv2d.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6965537Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv2d_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6982809Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv3d.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6985192Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv3d_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6987465Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_depthwise_fprop.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6989541Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/direct_convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6991624Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6993768Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6996032Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_strided_dgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.6998259Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.7000556Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_fused_epilogue.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.7002851Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/sm100_implicit_gemm_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.7005126Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/sm90_implicit_gemm_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T10:37:05.7006688Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/thread 2025-09-07T10:37:05.7008143Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/thread/depthwise_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/thread 2025-09-07T10:37:05.7009611Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7011603Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7014159Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7016800Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7019423Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7022037Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7024738Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_few_channels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7027266Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_fixed_channels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7029789Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7032294Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7034788Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_few_channels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7037264Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_fixed_channels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7039732Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7042020Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7044162Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7046497Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7049167Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7051991Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7054687Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7057256Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7059841Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7062412Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7065079Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7067648Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7070155Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7072684Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7075154Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7077461Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7079765Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7082267Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7084796Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7087340Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7089799Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_direct_conv_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7092592Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_fixed_stride_dilation.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7095440Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7098038Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_direct_conv_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7100617Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_filter_tile_access_iterator_direct_conv_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7103193Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7105559Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_mma_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7107886Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_mma_core_with_lane_access_size.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7110245Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_fprop_fusion_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7112552Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7114804Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7117100Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_wgrad_fusion_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7119499Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7121905Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7124183Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/threadblock_swizzle.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T10:37:05.7125758Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T10:37:05.7127190Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp/mma_depthwise_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T10:37:05.7129231Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp/mma_depthwise_simt_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T10:37:05.7131587Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp/scale_bias_relu_transform.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T10:37:05.7133506Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/coord.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7135200Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/core_io.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7136971Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/cuda_host_adapter.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7138757Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/cutlass.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7140078Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:05.7141556Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/blockwise_scale_layout.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:05.7143706Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/cluster.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:05.7145587Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/collective.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:05.7147019Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/detail/collective 2025-09-07T10:37:05.7148609Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/collective/mixed_input_utils.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail/collective 2025-09-07T10:37:05.7151075Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/dependent_false.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:05.7153074Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/helper_macros.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:05.7155016Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/layout.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:05.7157058Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/mainloop_fusion_helper_scale_factor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:05.7159089Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/mma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:05.7161247Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/sm100_blockscaled_layout.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:05.7163210Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/sm100_tmem_helper.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T10:37:05.7165043Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/device_kernel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7166387Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue 2025-09-07T10:37:05.7167297Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7168344Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:05.7170122Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm100_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:05.7172861Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm120_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:05.7175448Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm120_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:05.7178025Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm90_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:05.7180538Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm90_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T10:37:05.7182992Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/collective_builder.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7185429Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/collective_epilogue.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7187714Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/default_epilogue.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7190007Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/default_epilogue_array.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7192253Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7194517Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/epilogue_tensor_broadcast.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7196908Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_nosmem.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7199335Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7201732Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_nosmem.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7204123Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7206521Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7208885Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized_array.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7211569Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_array_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7214103Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7216736Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized_bias_elementwise.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T10:37:05.7219088Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/dispatch_policy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue 2025-09-07T10:37:05.7220612Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7222201Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/callbacks.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7224456Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/operations.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7226698Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_callbacks_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7229070Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_compute_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7231450Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_store_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7233833Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm120_callbacks_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7236184Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm120_visitor_store_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7238577Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7240915Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_compute_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7243288Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_load_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7245640Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_store_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7248002Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7250686Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_topk_softmax.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T10:37:05.7252366Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7253923Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/activation.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7256089Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/conversion_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7258237Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7260399Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7262825Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_elementwise.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7265110Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_relu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7267323Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_clamp.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7269599Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_dgelu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7271812Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_drelu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7274012Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_gelu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7276258Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_generic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7278560Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_generic_with_scaling.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7280870Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_hardswish.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7283116Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_leaky_relu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7285387Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7287684Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7289928Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_relu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7292380Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_relu0.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7294719Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_residual_block.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7297058Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_sigmoid.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7299347Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_silu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7301691Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_tensor_broadcast.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7304183Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_with_elementwise.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7306421Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/reduction_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7308509Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/scale_type.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T10:37:05.7310044Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7311790Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7314254Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op_blas3.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7316690Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_direct_store.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7319081Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7321484Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7323856Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7326229Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op_blas3.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7328637Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_volta_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7331093Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7333743Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7336225Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7338715Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_wmma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7341192Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7343593Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7346102Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_volta_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7348542Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_wmma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7351310Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/direct_store_epilogue_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7353689Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7355981Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7358382Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_base_streamk.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7360808Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_depthwise.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7363246Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_direct_store.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7365572Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_gemm_k_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7367915Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7370274Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_smem_accumulator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7372923Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_streamk_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7375389Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_visitor_with_softmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7377820Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7380271Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7382670Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7385195Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7387560Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor_callbacks.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7389929Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_workspace.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7391619Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:05.7393374Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_2x.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:05.7395851Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_compute.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:05.7398315Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_load.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:05.7400747Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_store.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:05.7403163Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitors.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T10:37:05.7405531Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/interleaved_epilogue.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7407873Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/output_iterator_parameter.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7410200Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/output_tile_thread_map.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7412835Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7415356Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7417907Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine_layout_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7420468Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_blas3.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7422980Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_conv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7425524Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_direct_conv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7427972Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7430415Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_predicates.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7432928Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_strided_dgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7435341Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7437664Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_mixed.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7440054Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_pitch_linear.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T10:37:05.7441707Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7443314Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7445614Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_gaussian_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7447843Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7450352Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7452747Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_volta_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7455036Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_wmma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7457230Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/simt_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7459397Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tensor_op_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7461532Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7463783Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7465935Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op_mixed.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7468155Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_volta_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7470352Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_wmma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7472492Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/volta_tensor_op_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7474597Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/wmma_tensor_op_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T10:37:05.7476477Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/exmy_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7477772Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/experimental 2025-09-07T10:37:05.7478732Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/experimental/distributed 2025-09-07T10:37:05.7479808Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T10:37:05.7481614Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T10:37:05.7484172Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/dist_gemm_universal_wrapper.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T10:37:05.7486799Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/full_barrier.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T10:37:05.7488622Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T10:37:05.7490407Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T10:37:05.7493231Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/dist_gemm_kernel_wrapper.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T10:37:05.7495927Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/full_barrier.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T10:37:05.7497827Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T10:37:05.7499791Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_1d_schedules.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T10:37:05.7502572Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_base_schedule.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T10:37:05.7504895Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/fast_math.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7506584Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/float8.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7508276Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/float_subbyte.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7510024Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/floating_point_nvrtc.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7511771Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/functional.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7513031Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T10:37:05.7513891Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7514881Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7516630Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_9xBF16_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7519129Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_sparse_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7521666Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7524211Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockwise_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7526624Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7529037Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_pipeline_carveout.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7531688Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_simt_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7534176Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_sparse_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7536650Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7539198Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_mma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7594585Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_sparse_mma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7597398Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockwise_mma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7599994Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7602536Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_mma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7605175Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_sparse_mma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7607742Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm1xx_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7610326Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm1xx_sparse_config.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7611925Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7613195Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_gmma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7614488Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_config.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7615816Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_gmma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T10:37:05.7616954Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_builder.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7618168Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_builder_decl.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7619320Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_mma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7620504Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_mma_decl.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7621717Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/fp8_accumulation.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7623192Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7624517Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7625735Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_sparse_mma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7626935Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7628153Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7629450Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_emulated.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7630608Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7631935Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7633087Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_emulated.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7634300Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_sparse_mma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7635473Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_array_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7636690Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7637827Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_sparse_mma_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7639026Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_array_tma_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7640133Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7641289Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_tma_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7642409Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_sparse_mma_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7643492Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm70_mma_twostage.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7644576Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm80_mma_array_multistage.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7645665Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm80_mma_multistage.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7646885Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7648154Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7649716Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7651091Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7652408Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_rs_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7653736Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7654991Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7656286Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7657433Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7658615Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7659956Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7661304Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7662474Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7663783Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized_fp8.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T10:37:05.7664222Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7665203Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/base_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7666271Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/default_gemm_configuration.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7667278Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/ell_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7668286Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7669200Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_array.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7670169Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_batched.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7671193Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7672164Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7673245Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_layernorm_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7674202Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7675318Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7676421Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_universal_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7677529Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7678583Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7679617Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_splitk_parallel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7680656Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7681742Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7682760Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7683905Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_streamk_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7685025Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7686181Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7687250Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_with_k_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7688191Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7689163Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/rank_2k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7690264Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/rank_2k_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7691272Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/rank_k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7692403Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/symm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7693427Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/trmm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T10:37:05.7694394Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/dispatch_policy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T10:37:05.7695347Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T10:37:05.7696379Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/gemm_enumerated_types.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T10:37:05.7697381Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/group_array_problem_shape.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T10:37:05.7697784Z #47 874.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7698944Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_ell_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7699968Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7701018Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7702092Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7703495Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_per_group_scale.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7704688Z #47 874.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_softmax_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7705800Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_layernorm_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7706948Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_planar_complex_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7707975Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7709028Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7710161Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7711339Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7712451Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7713521Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_splitk_parallel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7714663Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_streamk_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7715704Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7716815Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_universal_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7717963Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7719122Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7720236Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_k_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7721356Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7722391Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7723384Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7724453Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7725500Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7726572Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7727553Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7728617Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7729790Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7730870Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7732236Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7733380Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7734454Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7735529Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7736584Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7737681Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/ell_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7738676Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7739710Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_array.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7740731Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_batched.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7741740Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7742906Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_per_group_scale.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7744119Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_problem_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7745282Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_softmax_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7746441Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_layernorm_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7747479Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7748552Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7749954Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7751061Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex_array.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7752163Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7753410Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7754484Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_splitk_parallel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7755642Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_streamk_with_fused_epilogue.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7756797Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_transpose_operands.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7757816Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7759017Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7760083Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_decl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7761140Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_streamk.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7762441Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7763581Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor_streamk.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7764626Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7765733Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_fused_epilogue.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7766849Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_k_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7767827Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7768860Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemv_batched_strided.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7770009Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/grouped_problem_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7771087Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/params_sparse_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7772384Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/params_universal_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7773400Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7774519Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped_problem_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7775652Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_transpose_operands.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7776748Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7777807Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_k_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7778896Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7780117Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_input_transform.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7781376Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_mma_transform.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7782492Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7783827Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_input_transform.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7785035Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_mma_transform.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7786210Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_sparse_gemm_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7787329Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_static_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7788298Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7789357Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_group.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7790480Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_stream_k.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7791820Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm120_gemm_tma_warpspecialized_cooperative_asymmetric_dma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7792809Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm70_gemm.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7793862Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm70_gemm_array.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7795170Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_cooperative.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7796332Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_pingpong.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7797308Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7798480Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7799633Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7800837Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7801844Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7803061Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_cooperative.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7804225Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_pingpong.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7805297Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7806346Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_group.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7807407Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_stream_k.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7808396Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7809485Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7810567Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7811872Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/static_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7812905Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/symm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7813994Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7815052Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler_detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7816224Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7817246Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/trmm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T10:37:05.7817655Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T10:37:05.7818613Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread/mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T10:37:05.7819599Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm50.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T10:37:05.7820574Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm60.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T10:37:05.7821622Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm61.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T10:37:05.7822079Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7823372Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_ell_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7824431Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_gemv_core.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7825563Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7826625Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7827753Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7828844Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7829916Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm75.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7831116Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7832255Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sparse_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7833507Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_access_size.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7834742Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7835831Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_wmma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7837022Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_layernorm_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7838229Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7839463Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7840610Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_softmax_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7841782Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7842971Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7844129Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7845389Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7846532Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_trmm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7847679Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_sparse_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7848891Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_trmm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7850268Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/ell_mma_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7851494Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/ell_mma_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7852622Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/gemv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7853721Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/index_remat.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7854799Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7855931Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_blas3_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7857207Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_layernorm_mainloop_fusion_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7858363Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7859588Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7860718Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7861971Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7863278Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7864405Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_singlestage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7865633Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_softmax_mainloop_fusion_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7866812Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_sparse_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7867908Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_sparse_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7869164Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_with_reduction_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7870271Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7871511Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle_streamk.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T10:37:05.7871898Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7873060Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7874153Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_sparse_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7875197Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7876457Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7877650Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_with_reduction_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7878682Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_wmma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7879735Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/layernorm_scale_bias_transform.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7880684Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7881673Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7882721Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_fast_f32.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7883858Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_tile_iterator_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7884891Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7886161Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op_tile_iterator_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7887189Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_mixed_input_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7888122Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7889097Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7890115Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7891203Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7892385Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_sparse_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7893408Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7894572Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fast_f32.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7895686Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fragment_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7896744Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7897816Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7898955Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7900037Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7901166Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7902286Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7903505Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sparse.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7904639Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_wmma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7905621Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_wmma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7906653Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_with_reduction_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7907704Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/scale_bias_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7908834Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/softmax_scale_bias_transform.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7909819Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/tile_iterator_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T10:37:05.7910703Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm_coord.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7911612Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm_coord.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7912370Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/half.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7913218Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/integer_subbyte.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7914064Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/kernel_hardware_info.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7914910Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/kernel_hardware_info.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7915727Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/kernel_launch.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7916084Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.7916939Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/layout.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.7917804Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/matrix.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.7918660Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/permute.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.7919541Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/pitch_linear.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.7920436Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/tensor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.7921381Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.7922327Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm75.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.7923283Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.7924163Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/vector.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T10:37:05.7924948Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/matrix.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7925745Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/matrix_coord.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7926538Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/matrix_shape.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7927410Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/numeric_conversion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7928207Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/numeric_size.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7929038Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/numeric_types.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7929420Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T10:37:05.7930331Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline/pipeline.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T10:37:05.7931499Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline/sm100_pipeline.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T10:37:05.7932459Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline/sm90_pipeline.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T10:37:05.7933310Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pitch_linear_coord.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7933700Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/platform 2025-09-07T10:37:05.7934621Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/platform/platform.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/platform 2025-09-07T10:37:05.7935475Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/predicate_vector.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7936309Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/quaternion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7937125Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/real.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7937507Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/reduction 2025-09-07T10:37:05.7937943Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T10:37:05.7938981Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device/reduce_split_k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T10:37:05.7940052Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device/tensor_reduce.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T10:37:05.7941190Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device/tensor_reduce_affine_contiguous.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T10:37:05.7942295Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device/tensor_reduce_affine_strided.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T10:37:05.7942721Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T10:37:05.7943932Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel/reduce_softmax_final.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T10:37:05.7944935Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel/reduce_split_k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T10:37:05.7946064Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel/tensor_reduce_affine_contiguous.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T10:37:05.7947133Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel/tensor_reduce_affine_strided.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T10:37:05.7947549Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T10:37:05.7948532Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/thread/reduce.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T10:37:05.7949906Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/thread/reduction_operators.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T10:37:05.7950899Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/threadblock_swizzle.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction 2025-09-07T10:37:05.7951764Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/relatively_equal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7952586Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/semaphore.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7953540Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/subbyte_reference.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7954364Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tensor_coord.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7955170Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tensor_ref.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7956067Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tensor_ref_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7956930Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tensor_view.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7957824Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tensor_view_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7958652Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tfloat32.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7959018Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/thread 2025-09-07T10:37:05.7959895Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/thread/matrix.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/thread 2025-09-07T10:37:05.7960731Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/trace.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.7961114Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform 2025-09-07T10:37:05.7961711Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/collective 2025-09-07T10:37:05.7962827Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/collective/sm90_wgmma_transpose.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/collective 2025-09-07T10:37:05.7963237Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/device 2025-09-07T10:37:05.7964332Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/device/transform_universal_adapter.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/device 2025-09-07T10:37:05.7964756Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T10:37:05.7965837Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel/filter_format_transformer.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T10:37:05.7966927Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel/sm90_sparse_gemm_compressor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T10:37:05.7967981Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel/sparse_gemm_compressor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T10:37:05.7968944Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/pitch_linear_thread_map.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform 2025-09-07T10:37:05.7969394Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T10:37:05.7970389Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/thread/transpose.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T10:37:05.7971599Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/thread/unary_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T10:37:05.7972112Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7973210Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/ell_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7974416Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/ell_predicated_tile_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7975607Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/ell_predicated_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7976870Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_scale_bias_vector_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7978140Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_scale_bias_vector_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7979370Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7980636Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_2dthreadtile.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7981882Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7983323Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_triangular_matrix.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7984501Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7985594Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator_2dthreadtile.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7986707Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator_triangular_matrix.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7987820Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_vector_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7988912Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_scale_bias_vector_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7989969Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7991094Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_pitch_linear.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7992239Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_pitch_linear_direct_conv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7993328Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7994455Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_tensor_op_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7995504Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7996566Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_pitch_linear.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7997689Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_pitch_linear_2dthreadtile.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7998753Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.7999820Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_tensor_op_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.8000819Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/vector_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T10:37:05.8001188Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/warp 2025-09-07T10:37:05.8002133Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/warp/vector_fragment_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/warp 2025-09-07T10:37:05.8002882Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/uint128.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.8003590Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/version.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.8004305Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/wmma_array.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.8005038Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/workspace.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T10:37:05.8005329Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools 2025-09-07T10:37:05.8005595Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util 2025-09-07T10:37:05.8005904Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include 2025-09-07T10:37:05.8006246Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass 2025-09-07T10:37:05.8006614Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8007522Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/GPU_Clock.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8008451Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/command_line.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8009378Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/cublas_wrappers.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8010291Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/debug.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8011236Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_dump.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8012444Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_groupnorm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8013484Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_layernorm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8014507Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_memory.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8015555Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_nchw_to_nhwc.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8016600Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_nhwc_padding.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8017646Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_nhwc_pooling.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8018735Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_nhwc_to_nchw.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8019756Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_rmsnorm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8020781Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8021845Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/distribution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8022866Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/exceptions.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8024094Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/gett_commandline.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8024997Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/helper_cuda.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8026160Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/host_reorder.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8027090Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/host_tensor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8028047Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/host_tensor_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8028971Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/host_uncompress.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8030343Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/index_sequence.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8031495Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8032617Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/packed_stride.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8033673Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/print_error.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8034206Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference 2025-09-07T10:37:05.8034733Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T10:37:05.8036034Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail/inner_product.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T10:37:05.8037288Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail/linear_to_coordinate.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T10:37:05.8037795Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8039041Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8040221Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8041419Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/gemm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8042643Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/gemm_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8043864Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/gett.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8044445Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T10:37:05.8045792Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T10:37:05.8047086Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel/tensor_elementwise.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T10:37:05.8048361Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel/tensor_foreach.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T10:37:05.8049897Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/rank_2k_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8051226Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_compare.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8052471Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8053722Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_foreach.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8055038Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_reduce.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8056282Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_relu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T10:37:05.8056889Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/thread 2025-09-07T10:37:05.8058179Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/thread/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/thread 2025-09-07T10:37:05.8058710Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8059898Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/conv.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8061137Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8062397Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/error_metrics.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8063653Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8064743Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/gemm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8065849Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/gemm_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8066918Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/gett.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8067964Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/rank_2k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8069045Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/rank_2k_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8070141Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/rank_k_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8071217Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/symm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8072302Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/symm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8073390Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_compare.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8074525Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_compare.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8075616Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_copy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8076727Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_elementwise.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8077796Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_fill.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8078919Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_fill.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8080030Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_foreach.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8081109Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_norm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8082193Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_reduce.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8083282Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_reduce.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8084338Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/trmm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8085409Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/trmm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T10:37:05.8086325Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/tensor_view_io.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8087242Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/type_traits.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T10:37:05.8087465Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog 2025-09-07T10:37:05.8087726Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include 2025-09-07T10:37:05.8088004Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8088684Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/async.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8089453Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/async_logger-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8090164Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/async_logger.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8090461Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T10:37:05.8091243Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg/argv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T10:37:05.8092195Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg/env.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T10:37:05.8093044Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg/helpers-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T10:37:05.8093916Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg/helpers.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T10:37:05.8825904Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/common-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8826779Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/common.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8827138Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8828041Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/backtracer-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8828935Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/backtracer.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8829801Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/circular_q.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8830692Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/console_globals.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8831581Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/file_helper-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8832443Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/file_helper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8833291Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/fmt_helper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8834232Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/log_msg-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8835123Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/log_msg.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8836005Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/log_msg_buffer-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8836915Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/log_msg_buffer.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8837807Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/mpmc_blocking_q.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8838654Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/null_mutex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8839492Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/os-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8840442Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/os.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8841329Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/periodic_worker-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8842237Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/periodic_worker.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8843087Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/registry-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8843914Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/registry.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8844821Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/synchronous_factory.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8845694Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/tcp_client-windows.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8846514Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/tcp_client.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8847382Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/thread_pool-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8848218Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/thread_pool.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8849429Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/udp_client-windows.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8850546Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/udp_client.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8851531Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/windows_include.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T10:37:05.8851887Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.8852763Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bin_to_hex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.8853144Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8854062Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/args.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8854973Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/chrono.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8855876Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/color.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8856851Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/compile.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8857749Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/core.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8858762Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/fmt.license.rst -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8859701Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/format-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8860612Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/format.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8861538Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/locale.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8862421Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/os.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8863433Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/ostream.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8864329Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/printf.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8865215Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/ranges.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8866075Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/std.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8867000Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/xchar.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T10:37:05.8867793Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/chrono.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.8868606Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/compile.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.8869408Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/fmt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.8870187Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/ostr.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.8870989Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/ranges.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.8871753Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/std.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.8872532Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/xchar.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T10:37:05.8873345Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/formatter.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8874071Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fwd.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8874866Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/logger-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8875623Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/logger.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8876341Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/mdc.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8877182Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/pattern_formatter-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8877989Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/pattern_formatter.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8878325Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8879179Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/android_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8880052Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/ansicolor_sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8880908Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/ansicolor_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8881788Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/base_sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8882609Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/base_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8883473Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/basic_file_sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8884333Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/basic_file_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8885214Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/callback_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8886061Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/daily_file_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8886888Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/dist_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8887731Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/dup_filter_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8888626Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/hourly_file_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8889450Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/kafka_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8890305Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/mongo_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8891203Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/msvc_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8892225Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/null_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8893101Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/ostream_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8893955Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/qt_sinks.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8894847Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/ringbuffer_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8895778Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/rotating_file_sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8896672Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/rotating_file_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8897524Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8898400Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8899316Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/stdout_color_sinks-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8900204Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/stdout_color_sinks.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8901131Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/stdout_sinks-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8902002Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/stdout_sinks.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8902859Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/syslog_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8903843Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/systemd_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8904655Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/tcp_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8905508Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/udp_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8906394Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/win_eventlog_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8907267Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/wincolor_sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8908126Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/wincolor_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T10:37:05.8908894Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/spdlog-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8909643Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/spdlog.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8910425Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/stopwatch.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8911178Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/tweakme.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8911926Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/version.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T10:37:05.8912174Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/fused_moe 2025-09-07T10:37:05.8912722Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/fused_moe 2025-09-07T10:37:05.8913284Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe/core.py -> build/bdist.linux-x86_64/wheel/./flashinfer/fused_moe 2025-09-07T10:37:05.8913836Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/fused_moe 2025-09-07T10:37:05.8914042Z #47 874.7 creating build/bdist.linux-x86_64/wheel/flashinfer/jit 2025-09-07T10:37:05.8914529Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T10:37:05.8915061Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/activation.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T10:37:05.8915567Z #47 874.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/core.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T10:37:05.8916055Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/cpp_ext.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T10:37:05.8916589Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/cubin_loader.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T10:37:05.8917062Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/env.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T10:37:05.8917546Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T10:37:05.8917807Z #47 874.8 creating build/bdist.linux-x86_64/wheel/flashinfer/jit/attention 2025-09-07T10:37:05.8918404Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/attention 2025-09-07T10:37:05.8919059Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention/pytorch.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/attention 2025-09-07T10:37:05.8919658Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention/tvm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/attention 2025-09-07T10:37:05.8920284Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/attention 2025-09-07T10:37:05.8920901Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention/variants.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/attention 2025-09-07T10:37:05.8921186Z #47 874.8 creating build/bdist.linux-x86_64/wheel/flashinfer/jit/cutlass_gemm 2025-09-07T10:37:05.8921808Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/cutlass_gemm 2025-09-07T10:37:05.8922483Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm/cutlass_library.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/cutlass_gemm 2025-09-07T10:37:05.8923176Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm/generate_kernels.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/cutlass_gemm 2025-09-07T10:37:05.8923400Z #47 874.8 creating build/bdist.linux-x86_64/wheel/flashinfer/testing 2025-09-07T10:37:05.8923932Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/testing/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/testing 2025-09-07T10:37:05.8924475Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/testing/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/testing 2025-09-07T10:37:05.8924693Z #47 874.8 creating build/bdist.linux-x86_64/wheel/flashinfer/triton 2025-09-07T10:37:05.8925207Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T10:37:05.8925776Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/activation.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T10:37:05.8926308Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/cascade.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T10:37:05.8926851Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T10:37:05.8927376Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/norm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T10:37:05.8927886Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/page.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T10:37:05.8928465Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/sm_constraint_gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T10:37:05.8929031Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T10:37:05.8929286Z #47 874.8 creating build/bdist.linux-x86_64/wheel/flashinfer/triton/kernels 2025-09-07T10:37:05.8929896Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T10:37:05.8930557Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/activation.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T10:37:05.8931250Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/cascade.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T10:37:05.8932037Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/norm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T10:37:05.8932716Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/quant.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T10:37:05.8933408Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/sm_constraint_gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T10:37:05.8933670Z #47 874.8 creating build/bdist.linux-x86_64/wheel/flashinfer/tuning_configs 2025-09-07T10:37:05.8934448Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/tuning_configs/v0_1_trtllm_fused_moe_NVIDIA_B200.py -> build/bdist.linux-x86_64/wheel/./flashinfer/tuning_configs 2025-09-07T10:37:05.8934688Z #47 874.8 creating build/bdist.linux-x86_64/wheel/flashinfer/profiler 2025-09-07T10:37:05.8935250Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/profiler/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/profiler 2025-09-07T10:37:05.8935480Z #47 874.8 creating build/bdist.linux-x86_64/wheel/flashinfer/comm 2025-09-07T10:37:05.8936002Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T10:37:05.8936529Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/cuda_ipc.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T10:37:05.8937089Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/dlpack_utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T10:37:05.8937621Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/mapping.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T10:37:05.8938135Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/mnnvl.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T10:37:05.8938675Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/nvshmem.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T10:37:05.8939253Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/nvshmem_allreduce.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T10:37:05.8939819Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/trtllm_alltoall.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T10:37:05.8940389Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/trtllm_ar.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T10:37:05.8940945Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/trtllm_mnnvl_ar.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T10:37:05.8941459Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/vllm_ar.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T10:37:05.8941694Z #47 874.8 creating build/bdist.linux-x86_64/wheel/flashinfer/cudnn 2025-09-07T10:37:05.8942218Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/cudnn/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/cudnn 2025-09-07T10:37:05.8942894Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/cudnn/decode.py -> build/bdist.linux-x86_64/wheel/./flashinfer/cudnn 2025-09-07T10:37:05.8943429Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/cudnn/prefill.py -> build/bdist.linux-x86_64/wheel/./flashinfer/cudnn 2025-09-07T10:37:05.8943806Z #47 874.8 creating build/bdist.linux-x86_64/wheel/flashinfer/logits_processor 2025-09-07T10:37:05.8944375Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T10:37:05.8944973Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/compiler.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T10:37:05.8945566Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/fusion_rules.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T10:37:05.8946208Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/legalization.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T10:37:05.8946755Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/op.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T10:37:05.8947370Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/operators.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T10:37:05.8948157Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/pipeline.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T10:37:05.8948937Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/processors.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T10:37:05.8949904Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/types.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T10:37:05.8950589Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/validators.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T10:37:05.8951057Z #47 874.8 copying build/lib.linux-x86_64-cpython-312/flashinfer/py.typed -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T10:37:05.8951177Z #47 874.8 running install_egg_info 2025-09-07T10:37:05.8951306Z #47 874.8 running egg_info 2025-09-07T10:37:05.8951456Z #47 874.8 creating flashinfer_python.egg-info 2025-09-07T10:37:05.8951623Z #47 874.8 writing flashinfer_python.egg-info/PKG-INFO 2025-09-07T10:37:05.8951938Z #47 874.8 writing dependency_links to flashinfer_python.egg-info/dependency_links.txt 2025-09-07T10:37:05.8952207Z #47 874.8 writing requirements to flashinfer_python.egg-info/requires.txt 2025-09-07T10:37:05.8952471Z #47 874.8 writing top-level names to flashinfer_python.egg-info/top_level.txt 2025-09-07T10:37:05.8952723Z #47 874.8 writing manifest file 'flashinfer_python.egg-info/SOURCES.txt' 2025-09-07T10:37:05.8952984Z #47 874.9 reading manifest file 'flashinfer_python.egg-info/SOURCES.txt' 2025-09-07T10:37:06.0512542Z #47 874.9 adding license file 'LICENSE' 2025-09-07T10:37:06.0513204Z #47 874.9 adding license file 'licenses/LICENSE.cutlass.txt' 2025-09-07T10:37:06.0513764Z #47 874.9 adding license file 'licenses/LICENSE.flashattention3.txt' 2025-09-07T10:37:06.0514297Z #47 874.9 adding license file 'licenses/LICENSE.fmt.txt' 2025-09-07T10:37:06.0514783Z #47 874.9 adding license file 'licenses/LICENSE.spdlog.txt' 2025-09-07T10:37:06.0515344Z #47 874.9 writing manifest file 'flashinfer_python.egg-info/SOURCES.txt' 2025-09-07T10:37:06.0516209Z #47 874.9 Copying flashinfer_python.egg-info to build/bdist.linux-x86_64/wheel/./flashinfer_python-0.2.14.post1-py3.12.egg-info 2025-09-07T10:37:06.0516953Z #47 874.9 running install_scripts 2025-09-07T10:37:06.0517588Z #47 874.9 creating build/bdist.linux-x86_64/wheel/flashinfer_python-0.2.14.post1.dist-info/WHEEL 2025-09-07T10:37:06.0518804Z #47 874.9 creating '/workspace/wheels/flashinfer/.tmp-mfsd6jox/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it 2025-09-07T10:37:06.0519784Z #47 874.9 adding 'flashinfer/__init__.py' 2025-09-07T10:37:06.0520153Z #47 874.9 adding 'flashinfer/__main__.py' 2025-09-07T10:37:06.0520539Z #47 874.9 adding 'flashinfer/_build_meta.py' 2025-09-07T10:37:06.0520944Z #47 874.9 adding 'flashinfer/activation.py' 2025-09-07T10:37:06.0521321Z #47 874.9 adding 'flashinfer/aot.py' 2025-09-07T10:37:06.0521671Z #47 874.9 adding 'flashinfer/artifacts.py' 2025-09-07T10:37:06.0522054Z #47 874.9 adding 'flashinfer/attention.py' 2025-09-07T10:37:06.0522435Z #47 874.9 adding 'flashinfer/autotuner.py' 2025-09-07T10:37:06.0522797Z #47 874.9 adding 'flashinfer/cascade.py' 2025-09-07T10:37:06.0523175Z #47 874.9 adding 'flashinfer/cuda_utils.py' 2025-09-07T10:37:06.0523539Z #47 874.9 adding 'flashinfer/decode.py' 2025-09-07T10:37:06.0523912Z #47 874.9 adding 'flashinfer/deep_gemm.py' 2025-09-07T10:37:06.0524348Z #47 874.9 adding 'flashinfer/fp4_quantization.py' 2025-09-07T10:37:06.0524784Z #47 874.9 adding 'flashinfer/fp8_quantization.py' 2025-09-07T10:37:06.0525174Z #47 874.9 adding 'flashinfer/gemm.py' 2025-09-07T10:37:06.0525641Z #47 874.9 adding 'flashinfer/green_ctx.py' 2025-09-07T10:37:06.0525994Z #47 874.9 adding 'flashinfer/mla.py' 2025-09-07T10:37:06.0526368Z #47 874.9 adding 'flashinfer/norm.py' 2025-09-07T10:37:06.0526716Z #47 874.9 adding 'flashinfer/page.py' 2025-09-07T10:37:06.0527045Z #47 874.9 adding 'flashinfer/pod.py' 2025-09-07T10:37:06.0527391Z #47 874.9 adding 'flashinfer/prefill.py' 2025-09-07T10:37:06.0527740Z #47 874.9 adding 'flashinfer/py.typed' 2025-09-07T10:37:06.0528111Z #47 874.9 adding 'flashinfer/quantization.py' 2025-09-07T10:37:06.0528477Z #47 874.9 adding 'flashinfer/rope.py' 2025-09-07T10:37:06.0528831Z #47 874.9 adding 'flashinfer/sampling.py' 2025-09-07T10:37:06.0529186Z #47 874.9 adding 'flashinfer/sparse.py' 2025-09-07T10:37:06.0529553Z #47 874.9 adding 'flashinfer/tllm_utils.py' 2025-09-07T10:37:06.0529918Z #47 874.9 adding 'flashinfer/utils.py' 2025-09-07T10:37:06.0530272Z #47 874.9 adding 'flashinfer/comm/__init__.py' 2025-09-07T10:37:06.0530667Z #47 874.9 adding 'flashinfer/comm/cuda_ipc.py' 2025-09-07T10:37:06.0531156Z #47 874.9 adding 'flashinfer/comm/dlpack_utils.py' 2025-09-07T10:37:06.0531748Z #47 874.9 adding 'flashinfer/comm/mapping.py' 2025-09-07T10:37:06.0532182Z #47 874.9 adding 'flashinfer/comm/mnnvl.py' 2025-09-07T10:37:06.0532592Z #47 874.9 adding 'flashinfer/comm/nvshmem.py' 2025-09-07T10:37:06.0533024Z #47 874.9 adding 'flashinfer/comm/nvshmem_allreduce.py' 2025-09-07T10:37:06.0533502Z #47 874.9 adding 'flashinfer/comm/trtllm_alltoall.py' 2025-09-07T10:37:06.0533952Z #47 874.9 adding 'flashinfer/comm/trtllm_ar.py' 2025-09-07T10:37:06.0534383Z #47 874.9 adding 'flashinfer/comm/trtllm_mnnvl_ar.py' 2025-09-07T10:37:06.0534826Z #47 874.9 adding 'flashinfer/comm/vllm_ar.py' 2025-09-07T10:37:06.0535225Z #47 874.9 adding 'flashinfer/cudnn/__init__.py' 2025-09-07T10:37:06.0535645Z #47 874.9 adding 'flashinfer/cudnn/decode.py' 2025-09-07T10:37:06.0536050Z #47 874.9 adding 'flashinfer/cudnn/prefill.py' 2025-09-07T10:37:06.0536553Z #47 874.9 adding 'flashinfer/cute_dsl/blockscaled_gemm.py' 2025-09-07T10:37:06.0537019Z #47 874.9 adding 'flashinfer/cute_dsl/utils.py' 2025-09-07T10:37:06.0537446Z #47 874.9 adding 'flashinfer/data/custom_backend.py' 2025-09-07T10:37:06.0537878Z #47 874.9 adding 'flashinfer/data/setup.py' 2025-09-07T10:37:06.0538273Z #47 874.9 adding 'flashinfer/data/version.txt' 2025-09-07T10:37:06.0539949Z #47 875.0 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:06.1613645Z #47 875.1 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:06.3861894Z #47 875.4 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:06.6332347Z #47 875.6 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:06.8619349Z #47 875.8 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:07.1043645Z #47 876.1 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:07.2134158Z #47 876.2 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:07.3174132Z #47 876.3 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:07.4867121Z #47 876.5 adding 'flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so' 2025-09-07T10:37:07.6633239Z #47 876.5 adding 'flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so' 2025-09-07T10:37:07.6794900Z #47 876.6 adding 'flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so' 2025-09-07T10:37:07.8559809Z #47 876.7 adding 'flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so' 2025-09-07T10:37:08.8545017Z #47 877.8 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so' 2025-09-07T10:37:09.0250144Z #47 878.0 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so' 2025-09-07T10:37:10.1805648Z #47 879.2 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so' 2025-09-07T10:37:10.3575196Z #47 879.3 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so' 2025-09-07T10:37:11.5091942Z #47 880.5 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so' 2025-09-07T10:37:11.6780832Z #47 880.6 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so' 2025-09-07T10:37:12.8275845Z #47 881.8 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so' 2025-09-07T10:37:13.0043361Z #47 882.0 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so' 2025-09-07T10:37:14.4345212Z #47 883.4 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:14.6326477Z #47 883.6 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T10:37:14.8829735Z #47 883.7 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T10:37:15.8482029Z #47 884.8 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:17.9132042Z #47 886.9 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:19.4929706Z #47 888.5 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:19.6327362Z #47 888.6 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T10:37:19.8646566Z #47 888.7 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T10:37:21.5701794Z #47 890.5 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:23.0365411Z #47 892.0 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:24.5181355Z #47 893.5 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:24.7124744Z #47 893.7 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T10:37:24.9618310Z #47 893.8 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T10:37:25.9176967Z #47 894.9 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:26.1032038Z #47 895.0 adding 'flashinfer/data/aot/cascade/cascade.so' 2025-09-07T10:37:26.1032696Z #47 895.1 adding 'flashinfer/data/aot/fmha_cutlass_sm100a/fmha_cutlass_sm100a.so' 2025-09-07T10:37:26.2402720Z #47 895.1 adding 'flashinfer/data/aot/logging/logging.so' 2025-09-07T10:37:26.2403201Z #47 895.1 adding 'flashinfer/data/aot/mla/mla.so' 2025-09-07T10:37:26.2403635Z #47 895.2 adding 'flashinfer/data/aot/norm/norm.so' 2025-09-07T10:37:26.4305300Z #47 895.2 adding 'flashinfer/data/aot/page/page.so' 2025-09-07T10:37:26.4305866Z #47 895.3 adding 'flashinfer/data/aot/quantization/quantization.so' 2025-09-07T10:37:26.5785005Z #47 895.5 adding 'flashinfer/data/aot/rope/rope.so' 2025-09-07T10:37:27.6199641Z #47 896.6 adding 'flashinfer/data/aot/sampling/sampling.so' 2025-09-07T10:37:27.7799158Z #47 896.8 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:27.8850610Z #47 896.9 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:28.0928660Z #47 897.1 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:28.3230460Z #47 897.3 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:28.5344114Z #47 897.5 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:28.7604388Z #47 897.7 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:28.8659567Z #47 897.8 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:28.9660263Z #47 897.9 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T10:37:29.6006499Z #47 898.6 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:30.0892530Z #47 899.1 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:31.0462459Z #47 900.0 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:31.7757709Z #47 900.7 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:32.6573610Z #47 901.6 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:33.3290654Z #47 902.3 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:34.0057451Z #47 903.0 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:34.5045468Z #47 903.5 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T10:37:34.6048090Z #47 903.5 adding 'flashinfer/data/aot/trtllm_utils/trtllm_utils.so' 2025-09-07T10:37:34.6048641Z #47 903.5 adding 'flashinfer/data/csrc/activation.cu' 2025-09-07T10:37:34.6049285Z #47 903.5 adding 'flashinfer/data/csrc/aot_extension_utils.h' 2025-09-07T10:37:34.6049989Z #47 903.5 adding 'flashinfer/data/csrc/batch_attention.cu' 2025-09-07T10:37:34.6050586Z #47 903.5 adding 'flashinfer/data/csrc/batch_attention_customize_config.jinja' 2025-09-07T10:37:34.6051447Z #47 903.5 adding 'flashinfer/data/csrc/batch_attention_jit_pybind.cu' 2025-09-07T10:37:34.6052097Z #47 903.5 adding 'flashinfer/data/csrc/batch_attention_paged_kernel_inst.jinja' 2025-09-07T10:37:34.6052676Z #47 903.5 adding 'flashinfer/data/csrc/batch_decode.cu' 2025-09-07T10:37:34.6053183Z #47 903.5 adding 'flashinfer/data/csrc/batch_decode_config.inc' 2025-09-07T10:37:34.6053764Z #47 903.5 adding 'flashinfer/data/csrc/batch_decode_customize_config.jinja' 2025-09-07T10:37:34.6054377Z #47 903.5 adding 'flashinfer/data/csrc/batch_decode_jit_pybind.cu' 2025-09-07T10:37:34.6054958Z #47 903.5 adding 'flashinfer/data/csrc/batch_decode_kernel_inst.jinja' 2025-09-07T10:37:34.6055538Z #47 903.5 adding 'flashinfer/data/csrc/batch_decode_mla_config.jinja' 2025-09-07T10:37:34.6056204Z #47 903.5 adding 'flashinfer/data/csrc/batch_decode_mla_cute_sm80.cu' 2025-09-07T10:37:34.6056761Z #47 903.5 adding 'flashinfer/data/csrc/batch_decode_mla_plan.cu' 2025-09-07T10:37:34.6057314Z #47 903.5 adding 'flashinfer/data/csrc/batch_decode_mla_pybind.cu' 2025-09-07T10:37:34.6057949Z #47 903.5 adding 'flashinfer/data/csrc/batch_decode_mla_run.cu' 2025-09-07T10:37:34.6058730Z #47 903.5 adding 'flashinfer/data/csrc/batch_mla_config.jinja' 2025-09-07T10:37:34.6059239Z #47 903.5 adding 'flashinfer/data/csrc/batch_mla_plan.cu' 2025-09-07T10:37:34.6059717Z #47 903.5 adding 'flashinfer/data/csrc/batch_mla_pybind.cu' 2025-09-07T10:37:34.6060205Z #47 903.5 adding 'flashinfer/data/csrc/batch_mla_run.cu' 2025-09-07T10:37:34.6060683Z #47 903.5 adding 'flashinfer/data/csrc/batch_mla_sm90_plan.cu' 2025-09-07T10:37:34.6061210Z #47 903.5 adding 'flashinfer/data/csrc/batch_mla_sm90_pybind.cu' 2025-09-07T10:37:34.6061716Z #47 903.5 adding 'flashinfer/data/csrc/batch_mla_sm90_run.cu' 2025-09-07T10:37:34.6062265Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill.cu' 2025-09-07T10:37:34.6062967Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_config.inc' 2025-09-07T10:37:34.6063563Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_customize_config.jinja' 2025-09-07T10:37:34.6064249Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_fp8_paged_sm90_kernel_inst.jinja' 2025-09-07T10:37:34.6064995Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_fp8_ragged_sm90_kernel_inst.jinja' 2025-09-07T10:37:34.6065687Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_fp8_sm90.cu' 2025-09-07T10:37:34.6066234Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_jit_pybind.cu' 2025-09-07T10:37:34.6066839Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_paged_kernel_inst.jinja' 2025-09-07T10:37:34.6067508Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_paged_sm90_kernel_inst.jinja' 2025-09-07T10:37:34.6068568Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_ragged_kernel_inst.jinja' 2025-09-07T10:37:34.6069250Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_ragged_sm90_kernel_inst.jinja' 2025-09-07T10:37:34.6069860Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_sm90.cu' 2025-09-07T10:37:34.6080208Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_sm90_config.inc' 2025-09-07T10:37:34.6080866Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_sm90_customize_config.jinja' 2025-09-07T10:37:34.6081530Z #47 903.5 adding 'flashinfer/data/csrc/batch_prefill_sm90_jit_pybind.cu' 2025-09-07T10:37:34.6082098Z #47 903.5 adding 'flashinfer/data/csrc/blackwell_fmha_plan.cu' 2025-09-07T10:37:34.6082563Z #47 903.5 adding 'flashinfer/data/csrc/bmm_fp8.cu' 2025-09-07T10:37:34.6082990Z #47 903.5 adding 'flashinfer/data/csrc/cascade.cu' 2025-09-07T10:37:34.6083475Z #47 903.5 adding 'flashinfer/data/csrc/cudnn_sdpa_kernel_launcher.cu' 2025-09-07T10:37:34.6084004Z #47 903.5 adding 'flashinfer/data/csrc/cudnn_sdpa_utils.h' 2025-09-07T10:37:34.6084457Z #47 903.5 adding 'flashinfer/data/csrc/cutlass_mla.cu' 2025-09-07T10:37:34.6084949Z #47 903.5 adding 'flashinfer/data/csrc/flashinfer_cascade_ops.cu' 2025-09-07T10:37:34.6085478Z #47 903.5 adding 'flashinfer/data/csrc/flashinfer_gemm_ops.cu' 2025-09-07T10:37:34.6085994Z #47 903.5 adding 'flashinfer/data/csrc/flashinfer_gemm_sm90_ops.cu' 2025-09-07T10:37:34.6086611Z #47 903.5 adding 'flashinfer/data/csrc/flashinfer_mla_ops.cu' 2025-09-07T10:37:34.6087103Z #47 903.5 adding 'flashinfer/data/csrc/flashinfer_norm_ops.cu' 2025-09-07T10:37:34.6087599Z #47 903.5 adding 'flashinfer/data/csrc/flashinfer_ops.cu' 2025-09-07T10:37:34.6088075Z #47 903.5 adding 'flashinfer/data/csrc/flashinfer_ops_sm90.cu' 2025-09-07T10:37:34.6088586Z #47 903.5 adding 'flashinfer/data/csrc/flashinfer_page_ops.cu' 2025-09-07T10:37:34.6089134Z #47 903.5 adding 'flashinfer/data/csrc/flashinfer_quantization_ops.cu' 2025-09-07T10:37:34.6089673Z #47 903.5 adding 'flashinfer/data/csrc/flashinfer_rope_ops.cu' 2025-09-07T10:37:34.6090249Z #47 903.5 adding 'flashinfer/data/csrc/flashinfer_sampling_ops.cu' 2025-09-07T10:37:34.6090760Z #47 903.5 adding 'flashinfer/data/csrc/fmha_cutlass_sm100.cu' 2025-09-07T10:37:34.6091434Z #47 903.5 adding 'flashinfer/data/csrc/fmha_cutlass_sm100_pybind.cu' 2025-09-07T10:37:34.6092221Z #47 903.5 adding 'flashinfer/data/csrc/fp4_gemm_cutlass.cu' 2025-09-07T10:37:34.6092878Z #47 903.5 adding 'flashinfer/data/csrc/fp4_gemm_cutlass.jinja' 2025-09-07T10:37:34.6093454Z #47 903.5 adding 'flashinfer/data/csrc/fp8_gemm_cutlass.cu' 2025-09-07T10:37:34.6093953Z #47 903.5 adding 'flashinfer/data/csrc/fp8_gemm_cutlass.jinja' 2025-09-07T10:37:34.6094483Z #47 903.5 adding 'flashinfer/data/csrc/gemm_groupwise_sm100.cu' 2025-09-07T10:37:34.6095158Z #47 903.5 adding 'flashinfer/data/csrc/gemm_groupwise_sm100_kernel_inst.jinja' 2025-09-07T10:37:34.6095763Z #47 903.5 adding 'flashinfer/data/csrc/gemm_sm100_pybind.cu' 2025-09-07T10:37:34.6096231Z #47 903.5 adding 'flashinfer/data/csrc/group_gemm.cu' 2025-09-07T10:37:34.6096925Z #47 903.5 adding 'flashinfer/data/csrc/group_gemm_fp8_groupwise_sm100.cu' 2025-09-07T10:37:34.6098031Z #47 903.5 adding 'flashinfer/data/csrc/group_gemm_fp8_groupwise_sm100_kernel_inst.jinja' 2025-09-07T10:37:34.6098826Z #47 903.5 adding 'flashinfer/data/csrc/group_gemm_mxfp4_groupwise_sm100.cu' 2025-09-07T10:37:34.6099563Z #47 903.5 adding 'flashinfer/data/csrc/group_gemm_mxfp4_groupwise_sm100_kernel_inst.jinja' 2025-09-07T10:37:34.6100233Z #47 903.5 adding 'flashinfer/data/csrc/group_gemm_sm100_pybind.cu' 2025-09-07T10:37:34.6100817Z #47 903.5 adding 'flashinfer/data/csrc/group_gemm_sm90.cu' 2025-09-07T10:37:34.6101365Z #47 903.5 adding 'flashinfer/data/csrc/group_gemm_sm90_kernel_inst.jinja' 2025-09-07T10:37:34.6101970Z #47 903.5 adding 'flashinfer/data/csrc/logging.cc' 2025-09-07T10:37:34.6102413Z #47 903.5 adding 'flashinfer/data/csrc/norm.cu' 2025-09-07T10:37:34.6102855Z #47 903.5 adding 'flashinfer/data/csrc/nvshmem_binding.cu' 2025-09-07T10:37:34.6103316Z #47 903.5 adding 'flashinfer/data/csrc/page.cu' 2025-09-07T10:37:34.6103841Z #47 903.5 adding 'flashinfer/data/csrc/pod.cu' 2025-09-07T10:37:34.6104268Z #47 903.5 adding 'flashinfer/data/csrc/pod_config.inc' 2025-09-07T10:37:34.6104757Z #47 903.5 adding 'flashinfer/data/csrc/pod_customize_config.jinja' 2025-09-07T10:37:34.6105281Z #47 903.5 adding 'flashinfer/data/csrc/pod_jit_pybind.cu' 2025-09-07T10:37:34.6105852Z #47 903.5 adding 'flashinfer/data/csrc/pod_kernel_inst.jinja' 2025-09-07T10:37:34.6106452Z #47 903.5 adding 'flashinfer/data/csrc/pytorch_conversion_utils.h' 2025-09-07T10:37:34.6106997Z #47 903.5 adding 'flashinfer/data/csrc/pytorch_extension_utils.h' 2025-09-07T10:37:34.6107486Z #47 903.5 adding 'flashinfer/data/csrc/quantization.cu' 2025-09-07T10:37:34.6107931Z #47 903.5 adding 'flashinfer/data/csrc/renorm.cu' 2025-09-07T10:37:34.6108332Z #47 903.5 adding 'flashinfer/data/csrc/rope.cu' 2025-09-07T10:37:34.6108760Z #47 903.5 adding 'flashinfer/data/csrc/runtime_utils.h' 2025-09-07T10:37:34.6109332Z #47 903.5 adding 'flashinfer/data/csrc/sampling.cu' 2025-09-07T10:37:34.6109945Z #47 903.5 adding 'flashinfer/data/csrc/single_decode.cu' 2025-09-07T10:37:34.6110439Z #47 903.5 adding 'flashinfer/data/csrc/single_decode_config.inc' 2025-09-07T10:37:34.6111012Z #47 903.5 adding 'flashinfer/data/csrc/single_decode_customize_config.jinja' 2025-09-07T10:37:34.6111682Z #47 903.5 adding 'flashinfer/data/csrc/single_decode_jit_pybind.cu' 2025-09-07T10:37:34.6112238Z #47 903.5 adding 'flashinfer/data/csrc/single_decode_kernel_inst.jinja' 2025-09-07T10:37:34.6112778Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill.cu' 2025-09-07T10:37:34.6113326Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill_config.inc' 2025-09-07T10:37:34.6114036Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill_customize_config.jinja' 2025-09-07T10:37:34.6114640Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill_fp8_sm90.cu' 2025-09-07T10:37:34.6115242Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill_fp8_sm90_kernel_inst.jinja' 2025-09-07T10:37:34.6115873Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill_jit_pybind.cu' 2025-09-07T10:37:34.6116487Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill_kernel_inst.jinja' 2025-09-07T10:37:34.6117050Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill_sm90.cu' 2025-09-07T10:37:34.6117585Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill_sm90_config.inc' 2025-09-07T10:37:34.6118231Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill_sm90_customize_config.jinja' 2025-09-07T10:37:34.6118895Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill_sm90_jit_pybind.cu' 2025-09-07T10:37:34.6119515Z #47 903.5 adding 'flashinfer/data/csrc/single_prefill_sm90_kernel_inst.jinja' 2025-09-07T10:37:34.6120089Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_allreduce.cu' 2025-09-07T10:37:34.6120593Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_allreduce_fusion.cu' 2025-09-07T10:37:34.6121110Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_alltoall.cu' 2025-09-07T10:37:34.6121741Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_batched_gemm_runner.cu' 2025-09-07T10:37:34.6122534Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_fmha_kernel_launcher.cu' 2025-09-07T10:37:34.6123181Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_fused_moe_dev_kernel.cu' 2025-09-07T10:37:34.6123775Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_fused_moe_kernel_launcher.cu' 2025-09-07T10:37:34.6124415Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_fused_moe_routing_deepseek.cu' 2025-09-07T10:37:34.6125034Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_fused_moe_routing_llama4.cu' 2025-09-07T10:37:34.6125720Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_fused_moe_routing_renormalize.cu' 2025-09-07T10:37:34.6126335Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_fused_moe_runner.cu' 2025-09-07T10:37:34.6127065Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_gemm_runner.cu' 2025-09-07T10:37:34.6127792Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_mnnvl_allreduce.cu' 2025-09-07T10:37:34.6128343Z #47 903.5 adding 'flashinfer/data/csrc/trtllm_moe_allreduce_fusion.cu' 2025-09-07T10:37:34.6128911Z #47 903.5 adding 'flashinfer/data/csrc/vllm_custom_all_reduce.cu' 2025-09-07T10:37:34.6129734Z #47 903.5 adding 'flashinfer/data/csrc/fused_moe/cutlass_backend/cutlass_fused_moe_instantiation.cu' 2025-09-07T10:37:34.6130736Z #47 903.6 adding 'flashinfer/data/csrc/fused_moe/cutlass_backend/cutlass_fused_moe_kernels.cuh' 2025-09-07T10:37:34.6131881Z #47 903.6 adding 'flashinfer/data/csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_ops.cu' 2025-09-07T10:37:34.6132827Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/cpp/common/envUtils.cpp' 2025-09-07T10:37:34.6133566Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/cpp/common/logger.cpp' 2025-09-07T10:37:34.6134210Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/cpp/common/memoryUtils.cu' 2025-09-07T10:37:34.6134896Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/cpp/common/stringUtils.cpp' 2025-09-07T10:37:34.6135768Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/cpp/common/tllmException.cpp' 2025-09-07T10:37:34.6136482Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/cpp/kernels/quantization.cu' 2025-09-07T10:37:34.6137262Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/NvInferRuntime.h' 2025-09-07T10:37:34.6138063Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/assert.h' 2025-09-07T10:37:34.6139105Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaBf16Wrapper.h' 2025-09-07T10:37:34.6139949Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaFp8Utils.h' 2025-09-07T10:37:34.6140766Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaUtils.h' 2025-09-07T10:37:34.6141565Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/dataType.h' 2025-09-07T10:37:34.6142332Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/logger.h' 2025-09-07T10:37:34.6143129Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/quantization.h' 2025-09-07T10:37:34.6144062Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/stringUtils.h' 2025-09-07T10:37:34.6144880Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/tllmException.h' 2025-09-07T10:37:34.6145673Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cublasMMWrapper.h' 2025-09-07T10:37:34.6146439Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaBf16Fallbacks.cuh' 2025-09-07T10:37:34.6147290Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaDriverWrapper.h' 2025-09-07T10:37:34.6148166Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaTypeUtils.cuh' 2025-09-07T10:37:34.6149066Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/envUtils.h' 2025-09-07T10:37:34.6150046Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/memoryUtils.h' 2025-09-07T10:37:34.6150826Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/quantTypeUtils.cuh' 2025-09-07T10:37:34.6151726Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh' 2025-09-07T10:37:34.6152496Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/workspace.h' 2025-09-07T10:37:34.6153898Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/compute_occupancy.h' 2025-09-07T10:37:34.6155183Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue_helpers.h' 2025-09-07T10:37:34.6156380Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm_configs.h' 2025-09-07T10:37:34.6157643Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/interleaved_numeric_conversion.h' 2025-09-07T10:37:34.6158909Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/system_barrier.h' 2025-09-07T10:37:34.6160515Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/tile_interleaved_layout.h' 2025-09-07T10:37:34.6162013Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/weight_only_quant_op.h' 2025-09-07T10:37:34.6163229Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_red_global.hpp' 2025-09-07T10:37:34.6164490Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_sm90_multimem.hpp' 2025-09-07T10:37:34.6165774Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_traits_sm90_multimem.hpp' 2025-09-07T10:37:34.6167087Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/grid_dependency_control.h' 2025-09-07T10:37:34.6168282Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/mma.h' 2025-09-07T10:37:34.6169739Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective/sm90_allreduce_nvls_warpspecialized.hpp' 2025-09-07T10:37:34.6171378Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective/mixed_input_utils.hpp' 2025-09-07T10:37:34.6173018Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective/epilogue_moe_finalize.hpp' 2025-09-07T10:37:34.6174616Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_allreduce_tma_warpspecialized.hpp' 2025-09-07T10:37:34.6176444Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread/fused_activations.h' 2025-09-07T10:37:34.6177885Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_gated.hpp' 2025-09-07T10:37:34.6179463Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_interleaved.hpp' 2025-09-07T10:37:34.6181388Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_mixed_input.hpp' 2025-09-07T10:37:34.6182950Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_array_mixed_input.hpp' 2025-09-07T10:37:34.6184704Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_gated.hpp' 2025-09-07T10:37:34.6186534Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_interleaved.hpp' 2025-09-07T10:37:34.6188145Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input_.hpp' 2025-09-07T10:37:34.7250275Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized.hpp' 2025-09-07T10:37:34.7252183Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized_fp8.hpp' 2025-09-07T10:37:34.7253979Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_interleaved_tma_gmma_rs_warpspecialized_mixed_input.hpp' 2025-09-07T10:37:34.7255727Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_gated.inl' 2025-09-07T10:37:34.7257348Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_interleaved.inl' 2025-09-07T10:37:34.7258992Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_mixed_input.inl' 2025-09-07T10:37:34.7260522Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/default_fpA_intB_traits.h' 2025-09-07T10:37:34.7261909Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel.cuh' 2025-09-07T10:37:34.7263384Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_routine.cuh' 2025-09-07T10:37:34.7264775Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_traits.cuh' 2025-09-07T10:37:34.7266228Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_moe_problem_visitor.h' 2025-09-07T10:37:34.7267599Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_universal_allreduce.hpp' 2025-09-07T10:37:34.7268955Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/mixed_gemm_B_layout.h' 2025-09-07T10:37:34.7270238Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cute_util.cuh' 2025-09-07T10:37:34.7271590Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cutlass_kernel.h' 2025-09-07T10:37:34.7272900Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_problem_visitor.h' 2025-09-07T10:37:34.7274334Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized.hpp' 2025-09-07T10:37:34.7276031Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized_pingpong.hpp' 2025-09-07T10:37:34.7277481Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma.h' 2025-09-07T10:37:34.7278813Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_multistage.h' 2025-09-07T10:37:34.7280257Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_pipelined.h' 2025-09-07T10:37:34.7281590Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma.h' 2025-09-07T10:37:34.7282882Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma_bf16.h' 2025-09-07T10:37:34.7284161Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_base.h' 2025-09-07T10:37:34.7285435Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage.h' 2025-09-07T10:37:34.7286818Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_finegrained.h' 2025-09-07T10:37:34.7288239Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_percol.h' 2025-09-07T10:37:34.7289577Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined.h' 2025-09-07T10:37:34.7291015Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_finegrained.h' 2025-09-07T10:37:34.7292661Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_percol.h' 2025-09-07T10:37:34.7294066Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/default_mma_tensor_op.h' 2025-09-07T10:37:34.7295486Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_compute_B_with_f16.h' 2025-09-07T10:37:34.7296951Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_dequantizer.h' 2025-09-07T10:37:34.7298480Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock/fine_grained_scale_zero_iterator.h' 2025-09-07T10:37:34.7299920Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util/gather_tensor.hpp' 2025-09-07T10:37:34.7300919Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/delayStream.cu' 2025-09-07T10:37:34.7301684Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/delayStream.h' 2025-09-07T10:37:34.7302526Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.cu' 2025-09-07T10:37:34.7303467Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.h' 2025-09-07T10:37:34.7304346Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/quantization.cuh' 2025-09-07T10:37:34.7305060Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/quantization.h' 2025-09-07T10:37:34.7305897Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp' 2025-09-07T10:37:34.7306816Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.h' 2025-09-07T10:37:34.7307767Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_type_conversion.h' 2025-09-07T10:37:34.7308827Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm.h' 2025-09-07T10:37:34.7310013Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm_stub.cu' 2025-09-07T10:37:34.7311195Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scalebias.cu' 2025-09-07T10:37:34.7312354Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scaleonly.cu' 2025-09-07T10:37:34.7313501Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_per_col.cu' 2025-09-07T10:37:34.7314637Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scalebias.cu' 2025-09-07T10:37:34.7315795Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scaleonly.cu' 2025-09-07T10:37:34.7316915Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_per_col.cu' 2025-09-07T10:37:34.7318111Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_bf16_out_bf16.cu' 2025-09-07T10:37:34.7319554Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_f16_out_f16.cu' 2025-09-07T10:37:34.7320859Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_bf16_out_bf16.cu' 2025-09-07T10:37:34.7322151Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_f16_out_f16.cu' 2025-09-07T10:37:34.7323426Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_per_col_f16_out_f16.cu' 2025-09-07T10:37:34.7324655Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scalebias.cu' 2025-09-07T10:37:34.7325839Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scaleonly.cu' 2025-09-07T10:37:34.7327045Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_per_col.cu' 2025-09-07T10:37:34.7328200Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scalebias.cu' 2025-09-07T10:37:34.7329401Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scaleonly.cu' 2025-09-07T10:37:34.7330574Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_per_col.cu' 2025-09-07T10:37:34.7331900Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm.h' 2025-09-07T10:37:34.7333084Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h' 2025-09-07T10:37:34.7334276Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template_sm90.h' 2025-09-07T10:37:34.7335532Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.h' 2025-09-07T10:37:34.7336820Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.inl' 2025-09-07T10:37:34.7337924Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/common.h' 2025-09-07T10:37:34.7338955Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/cutlass_kernel_selector.h' 2025-09-07T10:37:34.7340050Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_gemm_kernels.h' 2025-09-07T10:37:34.7341093Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_kernels.h' 2025-09-07T10:37:34.7342112Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_util_kernels.h' 2025-09-07T10:37:34.7343239Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_bf16.cu' 2025-09-07T10:37:34.7344465Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp4.cu' 2025-09-07T10:37:34.7345553Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp8.cu' 2025-09-07T10:37:34.7346639Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint4.cu' 2025-09-07T10:37:34.7347749Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint8.cu' 2025-09-07T10:37:34.7349019Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp16.cu' 2025-09-07T10:37:34.7350355Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp4.cu' 2025-09-07T10:37:34.7351525Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint4.cu' 2025-09-07T10:37:34.7352693Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint8.cu' 2025-09-07T10:37:34.7353866Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp32_fp32.cu' 2025-09-07T10:37:34.7355024Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp4_fp4.cu' 2025-09-07T10:37:34.7356165Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp4.cu' 2025-09-07T10:37:34.7357392Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp8.cu' 2025-09-07T10:37:34.7358540Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_uint4.cu' 2025-09-07T10:37:34.7359708Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h' 2025-09-07T10:37:34.7360909Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws.h' 2025-09-07T10:37:34.7362269Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws_mixed_dtype.h' 2025-09-07T10:37:34.7363696Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_tma_warp_specialized_input.cu' 2025-09-07T10:37:34.7364851Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_tma_warp_specialized_traits.h' 2025-09-07T10:37:34.7366214Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.h' 2025-09-07T10:37:34.7367472Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.inl' 2025-09-07T10:37:34.7368691Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.h' 2025-09-07T10:37:34.7369898Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.inl' 2025-09-07T10:37:34.7371322Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.h' 2025-09-07T10:37:34.7372853Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.inl' 2025-09-07T10:37:34.7373928Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora/lora.cpp' 2025-09-07T10:37:34.7374699Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora/lora.h' 2025-09-07T10:37:34.7375441Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime/torchUtils.h' 2025-09-07T10:37:34.7376166Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Op.cpp' 2025-09-07T10:37:34.7376872Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.cpp' 2025-09-07T10:37:34.7377611Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.h' 2025-09-07T10:37:34.7378340Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.cpp' 2025-09-07T10:37:34.7379087Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.h' 2025-09-07T10:37:34.7379789Z #47 903.6 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/thUtils.h' 2025-09-07T10:37:34.7380421Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/config.hpp' 2025-09-07T10:37:34.7380999Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/int_tuple.hpp' 2025-09-07T10:37:34.7381563Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/layout.hpp' 2025-09-07T10:37:34.7382173Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/layout_composed.hpp' 2025-09-07T10:37:34.7382784Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/pointer.hpp' 2025-09-07T10:37:34.7383491Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/pointer_base.hpp' 2025-09-07T10:37:34.7384207Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/pointer_flagged.hpp' 2025-09-07T10:37:34.7384816Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/pointer_sparse.hpp' 2025-09-07T10:37:34.7385434Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/pointer_swizzle.hpp' 2025-09-07T10:37:34.7386030Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/stride.hpp' 2025-09-07T10:37:34.7386563Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/swizzle.hpp' 2025-09-07T10:37:34.7387124Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/swizzle_layout.hpp' 2025-09-07T10:37:34.7387717Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/tensor.hpp' 2025-09-07T10:37:34.7388269Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/tensor_impl.hpp' 2025-09-07T10:37:34.7388998Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/tensor_zip.hpp' 2025-09-07T10:37:34.7389589Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/underscore.hpp' 2025-09-07T10:37:34.7390189Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/axpby.hpp' 2025-09-07T10:37:34.7390864Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/clear.hpp' 2025-09-07T10:37:34.7391575Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/cooperative_copy.hpp' 2025-09-07T10:37:34.7392343Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/cooperative_gemm.hpp' 2025-09-07T10:37:34.7393146Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/copy.hpp' 2025-09-07T10:37:34.7393745Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/fill.hpp' 2025-09-07T10:37:34.7394388Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/functional.hpp' 2025-09-07T10:37:34.7395018Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/gemm.hpp' 2025-09-07T10:37:34.7395635Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/prefer.hpp' 2025-09-07T10:37:34.7396276Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/prefetch.hpp' 2025-09-07T10:37:34.7396974Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/tensor_algorithms.hpp' 2025-09-07T10:37:34.7397740Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/tensor_reduce.hpp' 2025-09-07T10:37:34.7398456Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/algorithm/tuple_algorithms.hpp' 2025-09-07T10:37:34.7399161Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/cluster_sm100.hpp' 2025-09-07T10:37:34.7399838Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/cluster_sm90.hpp' 2025-09-07T10:37:34.7400440Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/config.hpp' 2025-09-07T10:37:34.7401011Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/copy.hpp' 2025-09-07T10:37:34.7401580Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm100.hpp' 2025-09-07T10:37:34.7402215Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm100_tma.hpp' 2025-09-07T10:37:34.7402834Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm50.hpp' 2025-09-07T10:37:34.7403443Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm75.hpp' 2025-09-07T10:37:34.7404047Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm80.hpp' 2025-09-07T10:37:34.7404638Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm90.hpp' 2025-09-07T10:37:34.7405266Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm90_desc.hpp' 2025-09-07T10:37:34.7405904Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm90_tma.hpp' 2025-09-07T10:37:34.7406502Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/mma.hpp' 2025-09-07T10:37:34.7407060Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm100.hpp' 2025-09-07T10:37:34.7407688Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm100_desc.hpp' 2025-09-07T10:37:34.7408338Z #47 903.6 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm100_umma.hpp' 2025-09-07T10:37:34.7408955Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm120.hpp' 2025-09-07T10:37:34.7409599Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm120_sparse.hpp' 2025-09-07T10:37:34.7410235Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm61.hpp' 2025-09-07T10:37:34.7410867Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm70.hpp' 2025-09-07T10:37:34.7411731Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm75.hpp' 2025-09-07T10:37:34.7412355Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm80.hpp' 2025-09-07T10:37:34.7412989Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm89.hpp' 2025-09-07T10:37:34.7413609Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90.hpp' 2025-09-07T10:37:34.7414270Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90_desc.hpp' 2025-09-07T10:37:34.7414945Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma.hpp' 2025-09-07T10:37:34.7415690Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_ext.hpp' 2025-09-07T10:37:34.8252333Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_sparse.hpp' 2025-09-07T10:37:34.8253153Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_sparse_ext.hpp' 2025-09-07T10:37:34.8253921Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/simd_sm100.hpp' 2025-09-07T10:37:34.8254641Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/tmem_allocator_sm100.hpp' 2025-09-07T10:37:34.8255310Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/arch/util.hpp' 2025-09-07T10:37:34.8255929Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/atom/copy_atom.hpp' 2025-09-07T10:37:34.8256573Z #47 903.7 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits.hpp' 2025-09-07T10:37:34.8257276Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100.hpp' 2025-09-07T10:37:34.8258040Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100_im2col.hpp' 2025-09-07T10:37:34.8258835Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100_tma.hpp' 2025-09-07T10:37:34.8259726Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm50.hpp' 2025-09-07T10:37:34.8260452Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm75.hpp' 2025-09-07T10:37:34.8261174Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm80.hpp' 2025-09-07T10:37:34.8261929Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90.hpp' 2025-09-07T10:37:34.8262796Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_im2col.hpp' 2025-09-07T10:37:34.8263549Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_tma.hpp' 2025-09-07T10:37:34.8264316Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_tma_swizzle.hpp' 2025-09-07T10:37:34.8265041Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_atom.hpp' 2025-09-07T10:37:34.8265655Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits.hpp' 2025-09-07T10:37:34.8266327Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm100.hpp' 2025-09-07T10:37:34.8267015Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm120.hpp' 2025-09-07T10:37:34.8267748Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm120_sparse.hpp' 2025-09-07T10:37:34.8268483Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm61.hpp' 2025-09-07T10:37:34.8269155Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm70.hpp' 2025-09-07T10:37:34.8269837Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm75.hpp' 2025-09-07T10:37:34.8270508Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm80.hpp' 2025-09-07T10:37:34.8271188Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm89.hpp' 2025-09-07T10:37:34.8271875Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90.hpp' 2025-09-07T10:37:34.8272572Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma.hpp' 2025-09-07T10:37:34.8273385Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_ext.hpp' 2025-09-07T10:37:34.8274163Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse.hpp' 2025-09-07T10:37:34.8274998Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse_ext.hpp' 2025-09-07T10:37:34.8275742Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/atom/partitioner.hpp' 2025-09-07T10:37:34.8276415Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/container/alignment.hpp' 2025-09-07T10:37:34.8277076Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/container/array.hpp' 2025-09-07T10:37:34.8277741Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/container/array_aligned.hpp' 2025-09-07T10:37:34.8278514Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/container/array_subbyte.hpp' 2025-09-07T10:37:34.8279207Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/container/bit_field.hpp' 2025-09-07T10:37:34.8279902Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/container/cuda_types.hpp' 2025-09-07T10:37:34.8280573Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/container/tuple.hpp' 2025-09-07T10:37:34.8281224Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/container/type_list.hpp' 2025-09-07T10:37:34.8281938Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/numeric/arithmetic_tuple.hpp' 2025-09-07T10:37:34.8282613Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/numeric/complex.hpp' 2025-09-07T10:37:34.8283226Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/numeric/int.hpp' 2025-09-07T10:37:34.8283872Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/numeric/integer_sequence.hpp' 2025-09-07T10:37:34.8284612Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/numeric/integral_constant.hpp' 2025-09-07T10:37:34.8285377Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/numeric/integral_ratio.hpp' 2025-09-07T10:37:34.8286029Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/numeric/math.hpp' 2025-09-07T10:37:34.8286684Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/numeric/numeric_types.hpp' 2025-09-07T10:37:34.8287360Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/numeric/real.hpp' 2025-09-07T10:37:34.8287958Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/util/debug.hpp' 2025-09-07T10:37:34.8288529Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/util/print.hpp' 2025-09-07T10:37:34.8289140Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/util/print_latex.hpp' 2025-09-07T10:37:34.8289774Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/util/print_svg.hpp' 2025-09-07T10:37:34.8290402Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/util/print_tensor.hpp' 2025-09-07T10:37:34.8291142Z #47 903.8 adding 'flashinfer/data/cutlass/include/cute/util/type_traits.hpp' 2025-09-07T10:37:34.8291966Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/aligned_buffer.h' 2025-09-07T10:37:34.8292606Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/array.h' 2025-09-07T10:37:34.9259481Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/array_planar_complex.h' 2025-09-07T10:37:34.9260207Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/array_subbyte.h' 2025-09-07T10:37:34.9260836Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/barrier.h' 2025-09-07T10:37:34.9262087Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/bfloat16.h' 2025-09-07T10:37:34.9262711Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/blas3.h' 2025-09-07T10:37:34.9263419Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/blas3_types.h' 2025-09-07T10:37:34.9264021Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/block_striped.h' 2025-09-07T10:37:34.9264673Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/cluster_launch.hpp' 2025-09-07T10:37:34.9265291Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/complex.h' 2025-09-07T10:37:34.9265854Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/constants.h' 2025-09-07T10:37:34.9266597Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/coord.h' 2025-09-07T10:37:34.9267136Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/core_io.h' 2025-09-07T10:37:34.9267769Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/cuda_host_adapter.hpp' 2025-09-07T10:37:34.9268389Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/cutlass.h' 2025-09-07T10:37:34.9268981Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/device_kernel.h' 2025-09-07T10:37:34.9269589Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/exmy_base.h' 2025-09-07T10:37:34.9270160Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/fast_math.h' 2025-09-07T10:37:34.9270784Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/float8.h' 2025-09-07T10:37:34.9271366Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/float_subbyte.h' 2025-09-07T10:37:34.9272034Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/floating_point_nvrtc.h' 2025-09-07T10:37:34.9272676Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/functional.h' 2025-09-07T10:37:34.9273268Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm_coord.h' 2025-09-07T10:37:34.9273873Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm_coord.hpp' 2025-09-07T10:37:34.9274443Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/half.h' 2025-09-07T10:37:34.9275029Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/integer_subbyte.h' 2025-09-07T10:37:34.9275692Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/kernel_hardware_info.h' 2025-09-07T10:37:34.9276425Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/kernel_hardware_info.hpp' 2025-09-07T10:37:34.9277094Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/kernel_launch.h' 2025-09-07T10:37:34.9277736Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/matrix.h' 2025-09-07T10:37:34.9278325Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/matrix_coord.h' 2025-09-07T10:37:34.9278930Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/matrix_shape.h' 2025-09-07T10:37:34.9279639Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/numeric_conversion.h' 2025-09-07T10:37:34.9280279Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/numeric_size.h' 2025-09-07T10:37:34.9280905Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/numeric_types.h' 2025-09-07T10:37:34.9281560Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/pitch_linear_coord.h' 2025-09-07T10:37:34.9282221Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/predicate_vector.h' 2025-09-07T10:37:34.9282854Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/quaternion.h' 2025-09-07T10:37:34.9283415Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/real.h' 2025-09-07T10:37:34.9284017Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/relatively_equal.h' 2025-09-07T10:37:34.9284635Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/semaphore.h' 2025-09-07T10:37:34.9285268Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/subbyte_reference.h' 2025-09-07T10:37:34.9285917Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/tensor_coord.h' 2025-09-07T10:37:34.9286510Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/tensor_ref.h' 2025-09-07T10:37:34.9287190Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/tensor_ref_planar_complex.h' 2025-09-07T10:37:34.9287868Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/tensor_view.h' 2025-09-07T10:37:34.9288558Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/tensor_view_planar_complex.h' 2025-09-07T10:37:34.9289229Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/tfloat32.h' 2025-09-07T10:37:34.9289790Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/trace.h' 2025-09-07T10:37:34.9290344Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/uint128.h' 2025-09-07T10:37:34.9291010Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/version.h' 2025-09-07T10:37:34.9291768Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/wmma_array.h' 2025-09-07T10:37:34.9292436Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/workspace.h' 2025-09-07T10:37:34.9293041Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/arch.h' 2025-09-07T10:37:34.9293645Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/barrier.h' 2025-09-07T10:37:34.9294326Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/cache_operation.h' 2025-09-07T10:37:34.9295005Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/config.h' 2025-09-07T10:37:34.9295751Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/grid_dependency_control.h' 2025-09-07T10:37:34.9296470Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/memory.h' 2025-09-07T10:37:34.9297104Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/memory_sm75.h' 2025-09-07T10:37:34.9297779Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/memory_sm80.h' 2025-09-07T10:37:34.9298414Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma.h' 2025-09-07T10:37:34.9299022Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm100.h' 2025-09-07T10:37:34.9299675Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm50.h' 2025-09-07T10:37:34.9300303Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm60.h' 2025-09-07T10:37:34.9300942Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm61.h' 2025-09-07T10:37:34.9301568Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm70.h' 2025-09-07T10:37:34.9302213Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm75.h' 2025-09-07T10:37:34.9302881Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm80.h' 2025-09-07T10:37:34.9303618Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm89.h' 2025-09-07T10:37:34.9304237Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm90.h' 2025-09-07T10:37:34.9304886Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sparse_sm80.h' 2025-09-07T10:37:34.9305621Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sparse_sm89.h' 2025-09-07T10:37:34.9306299Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/reg_reconfig.h' 2025-09-07T10:37:34.9306923Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/simd.h' 2025-09-07T10:37:34.9307525Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/simd_sm60.h' 2025-09-07T10:37:34.9308142Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/simd_sm61.h' 2025-09-07T10:37:34.9308779Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/synclog.hpp' 2025-09-07T10:37:34.9309379Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/wmma.h' 2025-09-07T10:37:34.9309985Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/wmma_sm70.h' 2025-09-07T10:37:34.9310605Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/wmma_sm72.h' 2025-09-07T10:37:34.9311238Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/arch/wmma_sm75.h' 2025-09-07T10:37:34.9311925Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/conv/conv2d_problem_size.h' 2025-09-07T10:37:34.9312653Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/conv/conv3d_problem_size.h' 2025-09-07T10:37:34.9313414Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/conv/convnd_problem_shape.hpp' 2025-09-07T10:37:34.9314124Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/conv/convolution.h' 2025-09-07T10:37:34.9314769Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/conv/detail.hpp' 2025-09-07T10:37:34.9315446Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/conv/dispatch_policy.hpp' 2025-09-07T10:37:34.9316247Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/collective_builder.hpp' 2025-09-07T10:37:34.9317186Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/collective_conv.hpp' 2025-09-07T10:37:34.9317995Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/detail.hpp' 2025-09-07T10:37:34.9318939Z #47 903.8 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/sm100_implicit_gemm_umma_warpspecialized.hpp' 2025-09-07T10:37:34.9320105Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/sm90_implicit_gemm_gmma_ss_warpspecialized.hpp' 2025-09-07T10:37:34.9321140Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm100_common.inl' 2025-09-07T10:37:34.9322127Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm100_umma_builder.inl' 2025-09-07T10:37:34.9323080Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm90_common.inl' 2025-09-07T10:37:34.9324012Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm90_gmma_builder.inl' 2025-09-07T10:37:34.9324945Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/device/conv_universal_adapter.hpp' 2025-09-07T10:37:34.9325785Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/device/direct_convolution.h' 2025-09-07T10:37:34.9326650Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/device/implicit_gemm_convolution.h' 2025-09-07T10:37:34.9327583Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/device/implicit_gemm_convolution_fusion.h' 2025-09-07T10:37:34.9328451Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/conv_universal.hpp' 2025-09-07T10:37:34.9329240Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d.h' 2025-09-07T10:37:34.9330066Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_dgrad.h' 2025-09-07T10:37:34.9330903Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h' 2025-09-07T10:37:34.9332058Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_fusion.h' 2025-09-07T10:37:34.9333055Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_absmax.h' 2025-09-07T10:37:34.9334077Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_broadcast.h' 2025-09-07T10:37:34.9335096Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_reduction.h' 2025-09-07T10:37:34.9336079Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_group_fprop.h' 2025-09-07T10:37:34.9336986Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad.h' 2025-09-07T10:37:34.9337877Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad_fusion.h' 2025-09-07T10:37:34.9338779Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_dgrad.h' 2025-09-07T10:37:34.9339630Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop.h' 2025-09-07T10:37:34.9340538Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_fusion.h' 2025-09-07T10:37:34.9341528Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_with_broadcast.h' 2025-09-07T10:37:34.9342463Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_wgrad.h' 2025-09-07T10:37:34.9343414Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv2d.h' 2025-09-07T10:37:34.9344283Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv2d_with_broadcast.h' 2025-09-07T10:37:34.9345158Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv3d.h' 2025-09-07T10:37:34.9346020Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv3d_with_broadcast.h' 2025-09-07T10:37:34.9346967Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_depthwise_fprop.h' 2025-09-07T10:37:34.9347815Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/direct_convolution.h' 2025-09-07T10:37:34.9348915Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution.h' 2025-09-07T10:37:34.9350054Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_fusion.h' 2025-09-07T10:37:34.9351096Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_strided_dgrad.h' 2025-09-07T10:37:34.9352250Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_absmax.h' 2025-09-07T10:37:34.9353374Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_fused_epilogue.h' 2025-09-07T10:37:34.9354508Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/sm100_implicit_gemm_tma_warpspecialized.hpp' 2025-09-07T10:37:34.9355617Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/sm90_implicit_gemm_tma_warpspecialized.hpp' 2025-09-07T10:37:34.9356558Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/thread/depthwise_mma.h' 2025-09-07T10:37:34.9357599Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9358885Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9360204Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9361732Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9363052Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9364408Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_few_channels.h' 2025-09-07T10:37:34.9365846Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_fixed_channels.h' 2025-09-07T10:37:34.9367114Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9368341Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9369667Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_few_channels.h' 2025-09-07T10:37:34.9370835Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_fixed_channels.h' 2025-09-07T10:37:34.9372351Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9373417Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_params.h' 2025-09-07T10:37:34.9374292Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_tile_iterator.h' 2025-09-07T10:37:34.9375418Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9376728Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9378137Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9380141Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9381461Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9382740Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9384145Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9385479Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9386833Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9388108Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9389370Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9390609Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9391624Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_params.h' 2025-09-07T10:37:34.9392671Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9393942Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9395283Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_analytic.h' 2025-09-07T10:37:34.9396625Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_optimized.h' 2025-09-07T10:37:34.9397813Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_direct_conv_params.h' 2025-09-07T10:37:34.9399109Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_fixed_stride_dilation.h' 2025-09-07T10:37:34.9400622Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_optimized.h' 2025-09-07T10:37:34.9401918Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_direct_conv_multistage.h' 2025-09-07T10:37:34.9403188Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_filter_tile_access_iterator_direct_conv_optimized.h' 2025-09-07T10:37:34.9404421Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_pipelined.h' 2025-09-07T10:37:34.9405346Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_mma_base.h' 2025-09-07T10:37:34.9406341Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_mma_core_with_lane_access_size.h' 2025-09-07T10:37:34.9407506Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_fprop_fusion_multistage.h' 2025-09-07T10:37:34.9408528Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_multistage.h' 2025-09-07T10:37:34.9409461Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_pipelined.h' 2025-09-07T10:37:34.9410480Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_wgrad_fusion_multistage.h' 2025-09-07T10:37:34.9411890Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_access_iterator.h' 2025-09-07T10:37:34.9413128Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_iterator.h' 2025-09-07T10:37:34.9414159Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/threadblock_swizzle.h' 2025-09-07T10:37:34.9415022Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/warp/mma_depthwise_simt.h' 2025-09-07T10:37:34.9415921Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/warp/mma_depthwise_simt_tile_iterator.h' 2025-09-07T10:37:35.0277558Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/conv/warp/scale_bias_relu_transform.h' 2025-09-07T10:37:35.0278633Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/detail/blockwise_scale_layout.hpp' 2025-09-07T10:37:35.0279380Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/detail/cluster.hpp' 2025-09-07T10:37:35.0280069Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/detail/collective.hpp' 2025-09-07T10:37:35.0280790Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/detail/dependent_false.hpp' 2025-09-07T10:37:35.0281532Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/detail/helper_macros.hpp' 2025-09-07T10:37:35.0282223Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/detail/layout.hpp' 2025-09-07T10:37:35.0283020Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/detail/mainloop_fusion_helper_scale_factor.hpp' 2025-09-07T10:37:35.0283814Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/detail/mma.hpp' 2025-09-07T10:37:35.0284541Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/detail/sm100_blockscaled_layout.hpp' 2025-09-07T10:37:35.0285352Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/detail/sm100_tmem_helper.hpp' 2025-09-07T10:37:35.0286278Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/detail/collective/mixed_input_utils.hpp' 2025-09-07T10:37:35.0287119Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/dispatch_policy.hpp' 2025-09-07T10:37:35.0287995Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/collective_builder.hpp' 2025-09-07T10:37:35.0288985Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/collective_epilogue.hpp' 2025-09-07T10:37:35.0289930Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/default_epilogue.hpp' 2025-09-07T10:37:35.0290885Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/default_epilogue_array.hpp' 2025-09-07T10:37:35.0292091Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/detail.hpp' 2025-09-07T10:37:35.0293053Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/epilogue_tensor_broadcast.hpp' 2025-09-07T10:37:35.0294112Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_nosmem.hpp' 2025-09-07T10:37:35.0295315Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0296448Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_nosmem.hpp' 2025-09-07T10:37:35.0297538Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0298632Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp' 2025-09-07T10:37:35.0299707Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized_array.hpp' 2025-09-07T10:37:35.0300878Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_array_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0302071Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0303450Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized_bias_elementwise.hpp' 2025-09-07T10:37:35.0307194Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm100_builder.inl' 2025-09-07T10:37:35.0308220Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm120_builder.inl' 2025-09-07T10:37:35.0309186Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm120_common.inl' 2025-09-07T10:37:35.0310160Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm90_builder.inl' 2025-09-07T10:37:35.0311129Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm90_common.inl' 2025-09-07T10:37:35.0312049Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/callbacks.hpp' 2025-09-07T10:37:35.0312855Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/operations.hpp' 2025-09-07T10:37:35.0313799Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_callbacks_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0317011Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_compute_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0318174Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_store_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0319287Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm120_callbacks_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0320399Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm120_visitor_store_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0321494Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0322609Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_compute_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0323739Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_load_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0324889Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_store_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0325978Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_tma_warpspecialized.hpp' 2025-09-07T10:37:35.0326974Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_topk_softmax.hpp' 2025-09-07T10:37:35.0327843Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/activation.h' 2025-09-07T10:37:35.0328639Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/conversion_op.h' 2025-09-07T10:37:35.0329409Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/detail.hpp' 2025-09-07T10:37:35.0330218Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination.h' 2025-09-07T10:37:35.0331272Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_elementwise.h' 2025-09-07T10:37:35.0332498Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_relu.h' 2025-09-07T10:37:35.0333478Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_clamp.h' 2025-09-07T10:37:35.0334425Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_dgelu.h' 2025-09-07T10:37:35.0335379Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_drelu.h' 2025-09-07T10:37:35.0336317Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_gelu.h' 2025-09-07T10:37:35.0337286Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_generic.h' 2025-09-07T10:37:35.0338337Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_generic_with_scaling.h' 2025-09-07T10:37:35.0339453Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_hardswish.h' 2025-09-07T10:37:35.0340558Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_leaky_relu.h' 2025-09-07T10:37:35.0341545Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_params.h' 2025-09-07T10:37:35.0342571Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_planar_complex.h' 2025-09-07T10:37:35.0343674Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_relu.h' 2025-09-07T10:37:35.0344630Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_relu0.h' 2025-09-07T10:37:35.0345608Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_residual_block.h' 2025-09-07T10:37:35.0346594Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_sigmoid.h' 2025-09-07T10:37:35.0347534Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_silu.h' 2025-09-07T10:37:35.0348521Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_tensor_broadcast.hpp' 2025-09-07T10:37:35.0349983Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_with_elementwise.h' 2025-09-07T10:37:35.0350943Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/reduction_op.h' 2025-09-07T10:37:35.0351746Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/scale_type.h' 2025-09-07T10:37:35.0352733Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op.h' 2025-09-07T10:37:35.0353922Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op_blas3.h' 2025-09-07T10:37:35.0355083Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_direct_store.h' 2025-09-07T10:37:35.0356259Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_planar_complex.h' 2025-09-07T10:37:35.0357304Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h' 2025-09-07T10:37:35.0358327Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op.h' 2025-09-07T10:37:35.0359412Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op_blas3.h' 2025-09-07T10:37:35.0360522Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_volta_tensor_op.h' 2025-09-07T10:37:35.0361726Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_absmax.h' 2025-09-07T10:37:35.0362778Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_broadcast.h' 2025-09-07T10:37:35.0363856Z #47 903.9 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_reduction.h' 2025-09-07T10:37:35.0364926Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_wmma_tensor_op.h' 2025-09-07T10:37:35.0365935Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_simt.h' 2025-09-07T10:37:35.0367205Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_tensor_op.h' 2025-09-07T10:37:35.0368270Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_volta_tensor_op.h' 2025-09-07T10:37:35.0369378Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_wmma_tensor_op.h' 2025-09-07T10:37:35.0370510Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/direct_store_epilogue_iterator.h' 2025-09-07T10:37:35.0371697Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue.h' 2025-09-07T10:37:35.0372646Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_base.h' 2025-09-07T10:37:35.0373585Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_base_streamk.h' 2025-09-07T10:37:35.0374562Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_depthwise.h' 2025-09-07T10:37:35.0375537Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_direct_store.h' 2025-09-07T10:37:35.0376586Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_gemm_k_reduction.h' 2025-09-07T10:37:35.0377613Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_planar_complex.h' 2025-09-07T10:37:35.0378632Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_smem_accumulator.h' 2025-09-07T10:37:35.0379710Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_streamk_with_broadcast.h' 2025-09-07T10:37:35.0380809Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_visitor_with_softmax.h' 2025-09-07T10:37:35.0381830Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_absmax.h' 2025-09-07T10:37:35.0382977Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_broadcast.h' 2025-09-07T10:37:35.0383948Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_reduction.h' 2025-09-07T10:37:35.0384920Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor.h' 2025-09-07T10:37:35.0385946Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor_callbacks.h' 2025-09-07T10:37:35.0386935Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_workspace.h' 2025-09-07T10:37:35.0387909Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/interleaved_epilogue.h' 2025-09-07T10:37:35.0388885Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/output_iterator_parameter.h' 2025-09-07T10:37:35.0389873Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/output_tile_thread_map.h' 2025-09-07T10:37:35.0390859Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator.h' 2025-09-07T10:37:35.0391889Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine.h' 2025-09-07T10:37:35.0393053Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine_layout_params.h' 2025-09-07T10:37:35.0394206Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_blas3.h' 2025-09-07T10:37:35.0395272Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_conv.h' 2025-09-07T10:37:35.0396371Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_direct_conv.h' 2025-09-07T10:37:35.0397469Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_params.h' 2025-09-07T10:37:35.0398579Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_predicates.h' 2025-09-07T10:37:35.0399722Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_strided_dgrad.h' 2025-09-07T10:37:35.0400782Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator.h' 2025-09-07T10:37:35.0401795Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_mixed.h' 2025-09-07T10:37:35.0402873Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_pitch_linear.h' 2025-09-07T10:37:35.0403897Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_2x.hpp' 2025-09-07T10:37:35.0404854Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_compute.hpp' 2025-09-07T10:37:35.0405842Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_load.hpp' 2025-09-07T10:37:35.0406849Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_store.hpp' 2025-09-07T10:37:35.0407784Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitors.hpp' 2025-09-07T10:37:35.0408762Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_complex_tensor_op.h' 2025-09-07T10:37:35.0409844Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_gaussian_complex_tensor_op.h' 2025-09-07T10:37:35.0410867Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_simt.h' 2025-09-07T10:37:35.0412052Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_tensor_op.h' 2025-09-07T10:37:35.0413047Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_volta_tensor_op.h' 2025-09-07T10:37:35.0414088Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_wmma_tensor_op.h' 2025-09-07T10:37:35.0414990Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/simt_policy.h' 2025-09-07T10:37:35.0415815Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tensor_op_policy.h' 2025-09-07T10:37:35.0416680Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_simt.h' 2025-09-07T10:37:35.0417567Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op.h' 2025-09-07T10:37:35.0418568Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op_mixed.h' 2025-09-07T10:37:35.0419550Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_volta_tensor_op.h' 2025-09-07T10:37:35.0420542Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_wmma_tensor_op.h' 2025-09-07T10:37:35.0421488Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/volta_tensor_op_policy.h' 2025-09-07T10:37:35.0422386Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/wmma_tensor_op_policy.h' 2025-09-07T10:37:35.0423328Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/detail.hpp' 2025-09-07T10:37:35.0424509Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/dist_gemm_universal_wrapper.hpp' 2025-09-07T10:37:35.0425648Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/full_barrier.hpp' 2025-09-07T10:37:35.0426624Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/detail.hpp' 2025-09-07T10:37:35.0427667Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/dist_gemm_kernel_wrapper.hpp' 2025-09-07T10:37:35.0428758Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/full_barrier.hpp' 2025-09-07T10:37:35.0429841Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_1d_schedules.hpp' 2025-09-07T10:37:35.0431025Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_base_schedule.hpp' 2025-09-07T10:37:35.0431982Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/dispatch_policy.hpp' 2025-09-07T10:37:35.0432661Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/gemm.h' 2025-09-07T10:37:35.0433373Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/gemm_enumerated_types.h' 2025-09-07T10:37:35.0434167Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/group_array_problem_shape.hpp' 2025-09-07T10:37:35.0435028Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_builder.hpp' 2025-09-07T10:37:35.0435948Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_builder_decl.hpp' 2025-09-07T10:37:35.0436831Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_mma.hpp' 2025-09-07T10:37:35.0437728Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_mma_decl.hpp' 2025-09-07T10:37:35.0438600Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/fp8_accumulation.hpp' 2025-09-07T10:37:35.0439634Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp' 2025-09-07T10:37:35.0440784Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp' 2025-09-07T10:37:35.0441924Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_sparse_mma_warpspecialized.hpp' 2025-09-07T10:37:35.0443037Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized.hpp' 2025-09-07T10:37:35.0444162Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_blockwise_scaling.hpp' 2025-09-07T10:37:35.0445354Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_emulated.hpp' 2025-09-07T10:37:35.1281580Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized.hpp' 2025-09-07T10:37:35.1282756Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_blockwise_scaling.hpp' 2025-09-07T10:37:35.1284143Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_emulated.hpp' 2025-09-07T10:37:35.1285225Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_sparse_mma_warpspecialized.hpp' 2025-09-07T10:37:35.1286283Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_array_tma.hpp' 2025-09-07T10:37:35.1287268Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_tma.hpp' 2025-09-07T10:37:35.1288270Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_sparse_mma_tma.hpp' 2025-09-07T10:37:35.1289324Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_array_tma_blockwise_scaling.hpp' 2025-09-07T10:37:35.1290290Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_tma.hpp' 2025-09-07T10:37:35.1291502Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_tma_blockwise_scaling.hpp' 2025-09-07T10:37:35.1292494Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_sparse_mma_tma.hpp' 2025-09-07T10:37:35.1293416Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm70_mma_twostage.hpp' 2025-09-07T10:37:35.1294351Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm80_mma_array_multistage.hpp' 2025-09-07T10:37:35.1295317Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm80_mma_multistage.hpp' 2025-09-07T10:37:35.1296444Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input.hpp' 2025-09-07T10:37:35.1297695Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized.hpp' 2025-09-07T10:37:35.1299021Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8.hpp' 2025-09-07T10:37:35.1300402Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp' 2025-09-07T10:37:35.1301736Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_rs_warpspecialized.hpp' 2025-09-07T10:37:35.1302951Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_ss_warpspecialized.hpp' 2025-09-07T10:37:35.1304250Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized.hpp' 2025-09-07T10:37:35.1305414Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized_mixed_input.hpp' 2025-09-07T10:37:35.1306467Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss.hpp' 2025-09-07T10:37:35.1307473Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp' 2025-09-07T10:37:35.1308594Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp' 2025-09-07T10:37:35.1309907Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp' 2025-09-07T10:37:35.1311154Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized.hpp' 2025-09-07T10:37:35.1312361Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized_fp8.hpp' 2025-09-07T10:37:35.1313563Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_9xBF16_umma_builder.inl' 2025-09-07T10:37:35.1314713Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_sparse_umma_builder.inl' 2025-09-07T10:37:35.1315917Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_umma_builder.inl' 2025-09-07T10:37:35.1317039Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockwise_umma_builder.inl' 2025-09-07T10:37:35.1318055Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_common.inl' 2025-09-07T10:37:35.1319031Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_pipeline_carveout.inl' 2025-09-07T10:37:35.1320062Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_simt_builder.inl' 2025-09-07T10:37:35.1321079Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_sparse_umma_builder.inl' 2025-09-07T10:37:35.1322107Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_umma_builder.inl' 2025-09-07T10:37:35.1323167Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_mma_builder.inl' 2025-09-07T10:37:35.1324325Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_sparse_mma_builder.inl' 2025-09-07T10:37:35.1325477Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockwise_mma_builder.inl' 2025-09-07T10:37:35.1326473Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_common.inl' 2025-09-07T10:37:35.1327418Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_mma_builder.inl' 2025-09-07T10:37:35.1328430Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_sparse_mma_builder.inl' 2025-09-07T10:37:35.1329403Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm1xx_common.inl' 2025-09-07T10:37:35.1330399Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm1xx_sparse_config.inl' 2025-09-07T10:37:35.1331649Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_common.inl' 2025-09-07T10:37:35.1332630Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_gmma_builder.inl' 2025-09-07T10:37:35.1333643Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_config.inl' 2025-09-07T10:37:35.1334689Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_gmma_builder.inl' 2025-09-07T10:37:35.1335658Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/base_grouped.h' 2025-09-07T10:37:35.1336502Z #47 904.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/default_gemm_configuration.h' 2025-09-07T10:37:35.1337340Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/ell_gemm.h' 2025-09-07T10:37:35.1338055Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm.h' 2025-09-07T10:37:35.1338758Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_array.h' 2025-09-07T10:37:35.1339516Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_batched.h' 2025-09-07T10:37:35.1340271Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_complex.h' 2025-09-07T10:37:35.1341039Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_grouped.h' 2025-09-07T10:37:35.1341919Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_layernorm_mainloop_fusion.h' 2025-09-07T10:37:35.1343011Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse.h' 2025-09-07T10:37:35.1343812Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_universal.h' 2025-09-07T10:37:35.1344721Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_universal_with_absmax.h' 2025-09-07T10:37:35.1345701Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_with_absmax.h' 2025-09-07T10:37:35.1346564Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_with_visitor.h' 2025-09-07T10:37:35.1347429Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_splitk_parallel.h' 2025-09-07T10:37:35.1348237Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal.h' 2025-09-07T10:37:35.1349386Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h' 2025-09-07T10:37:35.1350265Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_base.h' 2025-09-07T10:37:35.1351209Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_streamk_with_broadcast.h' 2025-09-07T10:37:35.1352214Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_with_absmax.h' 2025-09-07T10:37:35.1353169Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_with_broadcast.h' 2025-09-07T10:37:35.1354080Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_with_k_reduction.h' 2025-09-07T10:37:35.1354864Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemv.h' 2025-09-07T10:37:35.1355546Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/rank_2k.h' 2025-09-07T10:37:35.1356305Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/rank_2k_grouped.h' 2025-09-07T10:37:35.1357068Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/rank_k.h' 2025-09-07T10:37:35.1357744Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/symm.h' 2025-09-07T10:37:35.1358417Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/trmm.h' 2025-09-07T10:37:35.1359216Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_ell_gemm.h' 2025-09-07T10:37:35.1360003Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm.h' 2025-09-07T10:37:35.1360966Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_complex.h' 2025-09-07T10:37:35.1361791Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped.h' 2025-09-07T10:37:35.1362718Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_per_group_scale.h' 2025-09-07T10:37:35.1363768Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_softmax_mainloop_fusion.h' 2025-09-07T10:37:35.1364885Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_layernorm_mainloop_fusion.h' 2025-09-07T10:37:35.1365915Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_planar_complex_universal.h' 2025-09-07T10:37:35.1366833Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse.h' 2025-09-07T10:37:35.1367717Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal.h' 2025-09-07T10:37:35.1368803Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal_with_absmax.h' 2025-09-07T10:37:35.1369821Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_absmax.h' 2025-09-07T10:37:35.1370765Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_visitor.h' 2025-09-07T10:37:35.1372048Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_splitk_parallel.h' 2025-09-07T10:37:35.1373043Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_streamk_with_broadcast.h' 2025-09-07T10:37:35.1373989Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_universal.h' 2025-09-07T10:37:35.1374952Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_universal_with_visitor.h' 2025-09-07T10:37:35.1375958Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_absmax.h' 2025-09-07T10:37:35.1376880Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_broadcast.h' 2025-09-07T10:37:35.1377830Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_k_reduction.h' 2025-09-07T10:37:35.1378768Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_reduction.h' 2025-09-07T10:37:35.1379620Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemv.h' 2025-09-07T10:37:35.1380398Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k.h' 2025-09-07T10:37:35.1381252Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_complex.h' 2025-09-07T10:37:35.1382148Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_grouped.h' 2025-09-07T10:37:35.1383043Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_universal.h' 2025-09-07T10:37:35.1384008Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k.h' 2025-09-07T10:37:35.1384814Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k_complex.h' 2025-09-07T10:37:35.1385683Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k_universal.h' 2025-09-07T10:37:35.1386489Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm.h' 2025-09-07T10:37:35.1387271Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm_complex.h' 2025-09-07T10:37:35.1388120Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm_universal.h' 2025-09-07T10:37:35.1388943Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm.h' 2025-09-07T10:37:35.1389769Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm_complex.h' 2025-09-07T10:37:35.1390614Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm_universal.h' 2025-09-07T10:37:35.1391384Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/ell_gemm.h' 2025-09-07T10:37:35.1392064Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm.h' 2025-09-07T10:37:35.1392741Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_array.h' 2025-09-07T10:37:35.1393507Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_batched.h' 2025-09-07T10:37:35.1394238Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped.h' 2025-09-07T10:37:35.1395070Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_per_group_scale.h' 2025-09-07T10:37:35.1395999Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_problem_visitor.h' 2025-09-07T10:37:35.1396953Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_softmax_mainloop_fusion.h' 2025-09-07T10:37:35.1397958Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_layernorm_mainloop_fusion.h' 2025-09-07T10:37:35.1398785Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_params.h' 2025-09-07T10:37:35.1399539Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_pipelined.h' 2025-09-07T10:37:35.1400339Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex.h' 2025-09-07T10:37:35.1401186Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex_array.h' 2025-09-07T10:37:35.1402057Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal.h' 2025-09-07T10:37:35.1403048Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal_with_absmax.h' 2025-09-07T10:37:35.1404007Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_splitk_parallel.h' 2025-09-07T10:37:35.1404906Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_streamk_with_fused_epilogue.h' 2025-09-07T10:37:35.1405811Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_transpose_operands.h' 2025-09-07T10:37:35.1406632Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal.h' 2025-09-07T10:37:35.1407402Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal.hpp' 2025-09-07T10:37:35.1408211Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_decl.h' 2025-09-07T10:37:35.1409051Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_streamk.h' 2025-09-07T10:37:35.1409924Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor.h' 2025-09-07T10:37:35.1410879Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor_streamk.h' 2025-09-07T10:37:35.1412040Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_absmax.h' 2025-09-07T10:37:35.1412897Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_fused_epilogue.h' 2025-09-07T10:37:35.1413789Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_k_reduction.h' 2025-09-07T10:37:35.1414550Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemv.h' 2025-09-07T10:37:35.3697926Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemv_batched_strided.h' 2025-09-07T10:37:35.3698874Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/grouped_problem_visitor.h' 2025-09-07T10:37:35.3699906Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/params_sparse_base.h' 2025-09-07T10:37:35.3711264Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/params_universal_base.h' 2025-09-07T10:37:35.3712305Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped.h' 2025-09-07T10:37:35.3713181Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped_problem_visitor.h' 2025-09-07T10:37:35.3714119Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_transpose_operands.h' 2025-09-07T10:37:35.3714976Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_universal.h' 2025-09-07T10:37:35.3715813Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_k_universal.h' 2025-09-07T10:37:35.3716737Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized.hpp' 2025-09-07T10:37:35.3717864Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_input_transform.hpp' 2025-09-07T10:37:35.3719078Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_mma_transform.hpp' 2025-09-07T10:37:35.3720159Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp' 2025-09-07T10:37:35.3721212Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_input_transform.hpp' 2025-09-07T10:37:35.3722352Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_mma_transform.hpp' 2025-09-07T10:37:35.3723432Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_sparse_gemm_tma_warpspecialized.hpp' 2025-09-07T10:37:35.3724425Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_static_tile_scheduler.hpp' 2025-09-07T10:37:35.3725323Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler.hpp' 2025-09-07T10:37:35.3726201Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_group.hpp' 2025-09-07T10:37:35.3727189Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_stream_k.hpp' 2025-09-07T10:37:35.3728290Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm120_gemm_tma_warpspecialized_cooperative_asymmetric_dma.hpp' 2025-09-07T10:37:35.3729290Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm70_gemm.hpp' 2025-09-07T10:37:35.3730056Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm70_gemm_array.hpp' 2025-09-07T10:37:35.3731145Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_cooperative.hpp' 2025-09-07T10:37:35.3732508Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_pingpong.hpp' 2025-09-07T10:37:35.3733480Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma.hpp' 2025-09-07T10:37:35.3734385Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized.hpp' 2025-09-07T10:37:35.3735451Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp' 2025-09-07T10:37:35.3736553Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp' 2025-09-07T10:37:35.3737571Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized.hpp' 2025-09-07T10:37:35.3738576Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_cooperative.hpp' 2025-09-07T10:37:35.3739643Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_pingpong.hpp' 2025-09-07T10:37:35.3740606Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler.hpp' 2025-09-07T10:37:35.3741546Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_group.hpp' 2025-09-07T10:37:35.3742540Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_stream_k.hpp' 2025-09-07T10:37:35.3743497Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm.h' 2025-09-07T10:37:35.3744303Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_absmax.h' 2025-09-07T10:37:35.3745180Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_visitor.h' 2025-09-07T10:37:35.3746076Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/static_tile_scheduler.hpp' 2025-09-07T10:37:35.3746902Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/symm_universal.h' 2025-09-07T10:37:35.3747676Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler.hpp' 2025-09-07T10:37:35.3748509Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler_detail.hpp' 2025-09-07T10:37:35.3749735Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler_params.h' 2025-09-07T10:37:35.3750566Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/trmm_universal.h' 2025-09-07T10:37:35.3751303Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/thread/mma.h' 2025-09-07T10:37:35.3751990Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm50.h' 2025-09-07T10:37:35.3752715Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm60.h' 2025-09-07T10:37:35.3753433Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm61.h' 2025-09-07T10:37:35.3754228Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_ell_mma.h' 2025-09-07T10:37:35.3755112Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_gemv_core.h' 2025-09-07T10:37:35.3755957Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma.h' 2025-09-07T10:37:35.3756874Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core.h' 2025-09-07T10:37:35.3757772Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_simt.h' 2025-09-07T10:37:35.3758709Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm70.h' 2025-09-07T10:37:35.3759644Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm75.h' 2025-09-07T10:37:35.3760568Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm80.h' 2025-09-07T10:37:35.3761547Z #47 904.1 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sparse_sm80.h' 2025-09-07T10:37:35.3762665Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_access_size.h' 2025-09-07T10:37:35.3763710Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_reduction.h' 2025-09-07T10:37:35.3764677Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_wmma.h' 2025-09-07T10:37:35.3765660Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_layernorm_mainloop_fusion.h' 2025-09-07T10:37:35.3766758Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_multistage.h' 2025-09-07T10:37:35.3767837Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_pipelined.h' 2025-09-07T10:37:35.3768908Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_softmax_mainloop_fusion.h' 2025-09-07T10:37:35.3769918Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_with_reduction.h' 2025-09-07T10:37:35.3771019Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex.h' 2025-09-07T10:37:35.3772308Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core.h' 2025-09-07T10:37:35.3773432Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core_sm80.h' 2025-09-07T10:37:35.3774551Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_trmm_complex.h' 2025-09-07T10:37:35.3775534Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_sparse_mma.h' 2025-09-07T10:37:35.3776438Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_trmm.h' 2025-09-07T10:37:35.3777334Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/ell_mma_multistage.h' 2025-09-07T10:37:35.3778234Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/ell_mma_pipelined.h' 2025-09-07T10:37:35.3779049Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/gemv.h' 2025-09-07T10:37:35.3779833Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/index_remat.h' 2025-09-07T10:37:35.3780627Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_base.h' 2025-09-07T10:37:35.3781483Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_blas3_multistage.h' 2025-09-07T10:37:35.3782532Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_layernorm_mainloop_fusion_multistage.h' 2025-09-07T10:37:35.3783634Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_multistage.h' 2025-09-07T10:37:35.3784457Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_pipelined.h' 2025-09-07T10:37:35.3785321Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_base.h' 2025-09-07T10:37:35.3786297Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_multistage.h' 2025-09-07T10:37:35.3787329Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_pipelined.h' 2025-09-07T10:37:35.3788235Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_singlestage.h' 2025-09-07T10:37:35.3789212Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_softmax_mainloop_fusion_multistage.h' 2025-09-07T10:37:35.3790168Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_sparse_base.h' 2025-09-07T10:37:35.3791045Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_sparse_multistage.h' 2025-09-07T10:37:35.3792004Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_with_reduction_multistage.h' 2025-09-07T10:37:35.3792939Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle.h' 2025-09-07T10:37:35.3793872Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle_streamk.h' 2025-09-07T10:37:35.3794808Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_complex_tensor_op.h' 2025-09-07T10:37:35.3795715Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_sparse_tensor_op.h' 2025-09-07T10:37:35.3796564Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op.h' 2025-09-07T10:37:35.3797410Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op_sm80.h' 2025-09-07T10:37:35.3798354Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_with_reduction_tensor_op.h' 2025-09-07T10:37:35.3799276Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_wmma_tensor_op.h' 2025-09-07T10:37:35.3800215Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/layernorm_scale_bias_transform.h' 2025-09-07T10:37:35.3800981Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma.h' 2025-09-07T10:37:35.3801743Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op.h' 2025-09-07T10:37:35.3802620Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_fast_f32.h' 2025-09-07T10:37:35.3803586Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_tile_iterator_sm80.h' 2025-09-07T10:37:35.3804564Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op.h' 2025-09-07T10:37:35.3805606Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op_tile_iterator_sm80.h' 2025-09-07T10:37:35.3806599Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_mixed_input_tensor_op.h' 2025-09-07T10:37:35.3807435Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_planar_complex.h' 2025-09-07T10:37:35.3808166Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt.h' 2025-09-07T10:37:35.3808893Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt_policy.h' 2025-09-07T10:37:35.3809684Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt_tile_iterator.h' 2025-09-07T10:37:35.3810506Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_sparse_tensor_op.h' 2025-09-07T10:37:35.3811536Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op.h' 2025-09-07T10:37:35.3812330Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fast_f32.h' 2025-09-07T10:37:35.3813247Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fragment_iterator.h' 2025-09-07T10:37:35.3814131Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_policy.h' 2025-09-07T10:37:35.3814959Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_sm70.h' 2025-09-07T10:37:35.3815921Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_access_iterator.h' 2025-09-07T10:37:35.3816862Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator.h' 2025-09-07T10:37:35.3817803Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm70.h' 2025-09-07T10:37:35.3818759Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm80.h' 2025-09-07T10:37:35.3819737Z #47 904.2 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sparse.h' 2025-09-07T10:37:35.3820403Z #47 904.2 adding 'flashinfer 2025-09-07T10:37:35.3820759Z #47 904.2 [output clipped, log limit 2MiB reached] 2025-09-07T10:37:37.8120963Z #47 DONE 906.8s 2025-09-07T10:37:37.9651195Z 2025-09-07T10:37:37.9652160Z #48 [vllm-base 16/18] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system wheels/flashinfer/*.whl --verbose 2025-09-07T10:37:38.4080040Z #48 0.594 DEBUG uv 0.8.4 2025-09-07T10:37:38.5905355Z #48 0.594 DEBUG Searching for default Python interpreter in managed installations or search path 2025-09-07T10:37:38.5906777Z #48 0.594 DEBUG Searching for managed installations at `/root/.local/share/uv/python` 2025-09-07T10:37:38.5908285Z #48 0.596 DEBUG Found `cpython-3.12.11-linux-x86_64-gnu` at `/opt/python/cp312-cp312/bin/python` (first executable in the search path) 2025-09-07T10:37:38.5909657Z #48 0.596 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T10:37:38.5910541Z #48 0.597 DEBUG Acquired lock for `/opt/python/cp312-cp312` 2025-09-07T10:37:38.5912120Z #48 0.602 DEBUG At least one requirement is not satisfied: file:///workspace/wheels/flashinfer/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl 2025-09-07T10:37:38.5913796Z #48 0.603 DEBUG Using request timeout of 500s 2025-09-07T10:37:38.5914604Z #48 0.608 DEBUG Solving with installed Python version: 3.12.11 2025-09-07T10:37:38.5915478Z #48 0.608 DEBUG Solving with target Python version: >=3.12.11 2025-09-07T10:37:38.5916480Z #48 0.608 DEBUG Adding direct dependency: flashinfer-python* 2025-09-07T10:37:38.5918285Z #48 0.608 DEBUG Searching for a compatible version of flashinfer-python @ file:///workspace/wheels/flashinfer/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl (*) 2025-09-07T10:37:38.5920370Z #48 0.608 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: numpy* 2025-09-07T10:37:38.5921707Z #48 0.608 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: torch* 2025-09-07T10:37:38.5923139Z #48 0.608 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: ninja* 2025-09-07T10:37:38.5924527Z #48 0.608 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: requests* 2025-09-07T10:37:38.5925934Z #48 0.608 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: cuda-python<=12.9+ 2025-09-07T10:37:38.5927417Z #48 0.608 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: pynvml* 2025-09-07T10:37:38.5928352Z #48 0.608 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: einops* 2025-09-07T10:37:38.5929172Z #48 0.608 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: packaging>=24.2 2025-09-07T10:37:38.5930105Z #48 0.608 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: nvidia-cudnn-frontend>=1.13.0 2025-09-07T10:37:38.5931217Z #48 0.608 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: torch>=2.9.dev0, <2.10.dev0 2025-09-07T10:37:38.5932020Z #48 0.610 DEBUG No cache entry for: https://pypi.org/simple/cuda-python/ 2025-09-07T10:37:38.5932623Z #48 0.610 DEBUG No cache entry for: https://pypi.org/simple/pynvml/ 2025-09-07T10:37:38.5933221Z #48 0.610 DEBUG Found stale response for: https://pypi.org/simple/ninja/ 2025-09-07T10:37:38.5933881Z #48 0.610 DEBUG Sending revalidation request for: https://pypi.org/simple/ninja/ 2025-09-07T10:37:38.5934565Z #48 0.610 DEBUG Found stale response for: https://pypi.org/simple/requests/ 2025-09-07T10:37:38.5935362Z #48 0.610 DEBUG Sending revalidation request for: https://pypi.org/simple/requests/ 2025-09-07T10:37:38.5936059Z #48 0.610 DEBUG Found stale response for: https://pypi.org/simple/einops/ 2025-09-07T10:37:38.5936719Z #48 0.610 DEBUG Sending revalidation request for: https://pypi.org/simple/einops/ 2025-09-07T10:37:38.5937399Z #48 0.610 DEBUG Found stale response for: https://pypi.org/simple/torch/ 2025-09-07T10:37:38.5938074Z #48 0.610 DEBUG Sending revalidation request for: https://pypi.org/simple/torch/ 2025-09-07T10:37:38.5938761Z #48 0.610 DEBUG No cache entry for: https://pypi.org/simple/nvidia-cudnn-frontend/ 2025-09-07T10:37:38.5939440Z #48 0.610 DEBUG Found stale response for: https://pypi.org/simple/packaging/ 2025-09-07T10:37:38.5940137Z #48 0.610 DEBUG Sending revalidation request for: https://pypi.org/simple/packaging/ 2025-09-07T10:37:38.5940823Z #48 0.612 DEBUG Found stale response for: https://pypi.org/simple/numpy/ 2025-09-07T10:37:38.5941489Z #48 0.612 DEBUG Sending revalidation request for: https://pypi.org/simple/numpy/ 2025-09-07T10:37:38.5942177Z #48 0.619 DEBUG Found not-modified response for: https://pypi.org/simple/numpy/ 2025-09-07T10:37:38.5942863Z #48 0.624 DEBUG Found not-modified response for: https://pypi.org/simple/ninja/ 2025-09-07T10:37:38.5943560Z #48 0.624 DEBUG Found not-modified response for: https://pypi.org/simple/requests/ 2025-09-07T10:37:38.5944273Z #48 0.624 DEBUG Found not-modified response for: https://pypi.org/simple/einops/ 2025-09-07T10:37:38.5944958Z #48 0.624 DEBUG Found not-modified response for: https://pypi.org/simple/torch/ 2025-09-07T10:37:38.5945666Z #48 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/packaging/ 2025-09-07T10:37:38.5946308Z #48 0.626 DEBUG Searching for a compatible version of numpy (*) 2025-09-07T10:37:38.5946930Z #48 0.626 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T10:37:38.5947500Z #48 0.626 DEBUG Selecting: numpy==2.2.6 [installed] (installed) 2025-09-07T10:37:38.5948102Z #48 0.626 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T10:37:38.5948935Z #48 0.627 DEBUG Found installed version of ninja==1.13.0 that satisfies * 2025-09-07T10:37:38.5949611Z #48 0.627 DEBUG Searching for a compatible version of torch (>=2.9.dev0, <2.10.dev0) 2025-09-07T10:37:38.5950840Z #48 0.627 DEBUG Found installed version of torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.9.dev0, <2.10.dev0 2025-09-07T10:37:38.5952111Z #48 0.627 DEBUG Selecting: torch==2.9.0.dev20250901+cu129 [installed] (installed) 2025-09-07T10:37:38.5952750Z #48 0.627 DEBUG Found installed version of requests==2.32.5 that satisfies * 2025-09-07T10:37:38.5953383Z #48 0.627 DEBUG Found installed version of einops==0.8.1 that satisfies * 2025-09-07T10:37:38.5954547Z #48 0.627 DEBUG Found installed version of torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.9.dev0, <2.10.dev0 2025-09-07T10:37:38.5955746Z #48 0.627 DEBUG Found installed version of packaging==25.0 that satisfies >=24.2 2025-09-07T10:37:38.5957065Z #48 0.627 DEBUG No cache entry for: https://files.pythonhosted.org/packages/24/3c/4475aebeaab9651f2e61000fbe76f91a476d371dbfbf0a1cf46e689af253/cuda_python-12.9.0-py3-none-any.whl.metadata 2025-09-07T10:37:38.5958400Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: filelock* 2025-09-07T10:37:38.5959247Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: typing-extensions>=4.10.0 2025-09-07T10:37:38.5960233Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: setuptools{python_full_version >= '3.12'}* 2025-09-07T10:37:38.5961262Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: sympy>=1.13.3 2025-09-07T10:37:38.5962100Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: networkx>=2.5.1 2025-09-07T10:37:38.5962943Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: jinja2* 2025-09-07T10:37:38.5965157Z #48 0.627 DEBUG No cache entry for: https://files.pythonhosted.org/packages/d7/4a/cac76c174bb439a0c46c9a4413fcbea5c6cabfb01879f7bbdb9fdfaed76c/pynvml-13.0.1-py3-none-any.whl.metadata 2025-09-07T10:37:38.5967449Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: fsspec>=0.8.5 2025-09-07T10:37:38.5969005Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.86, <12.9.86+ 2025-09-07T10:37:38.5970519Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T10:37:38.5972372Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T10:37:38.5973894Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=9.10.2.21, <9.10.2.21+ 2025-09-07T10:37:38.5975414Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.1.4, <12.9.1.4+ 2025-09-07T10:37:38.5976911Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.4.1.4, <11.4.1.4+ 2025-09-07T10:37:38.5978433Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=10.3.10.19, <10.3.10.19+ 2025-09-07T10:37:38.5980160Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.7.5.82, <11.7.5.82+ 2025-09-07T10:37:38.5981709Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.5.10.65, <12.5.10.65+ 2025-09-07T10:37:38.5983365Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=0.7.1, <0.7.1+ 2025-09-07T10:37:38.5985339Z #48 0.627 DEBUG No cache entry for: https://files.pythonhosted.org/packages/b7/b8/5f812452c653447b4c09fec3cf0c5192abab1ce18358fcfab16a70113cfa/nvidia_cudnn_frontend-1.14.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T10:37:38.5987359Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=2.27.5, <2.27.5+ 2025-09-07T10:37:38.5988794Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=3.3.20, <3.3.20+ 2025-09-07T10:37:38.5990242Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.79, <12.9.79+ 2025-09-07T10:37:38.5991704Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.9.86, <12.9.86+ 2025-09-07T10:37:38.5993173Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=1.14.1.1, <1.14.1.1+ 2025-09-07T10:37:38.5994493Z #48 0.627 DEBUG Adding transitive dependency for torch==2.9.0.dev20250901+cu129: pytorch-triton{sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T10:37:38.5995440Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/filelock/ 2025-09-07T10:37:38.5996107Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/filelock/ 2025-09-07T10:37:38.5996830Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T10:37:38.5997575Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/typing-extensions/ 2025-09-07T10:37:38.5998279Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/sympy/ 2025-09-07T10:37:38.5998911Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/sympy/ 2025-09-07T10:37:38.5999572Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/networkx/ 2025-09-07T10:37:38.6000401Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/networkx/ 2025-09-07T10:37:38.6001058Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/jinja2/ 2025-09-07T10:37:38.6001715Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/jinja2/ 2025-09-07T10:37:38.6002352Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/fsspec/ 2025-09-07T10:37:38.6003005Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/fsspec/ 2025-09-07T10:37:38.6003738Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T10:37:38.6004528Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T10:37:38.6005338Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T10:37:38.6006146Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T10:37:38.6006989Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T10:37:38.6007771Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T10:37:38.6008600Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T10:37:38.6009356Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T10:37:38.6010099Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T10:37:38.6010861Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T10:37:38.6011881Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T10:37:38.6012722Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T10:37:38.6013502Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T10:37:38.6014273Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T10:37:38.6015072Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T10:37:38.6015871Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T10:37:38.6016675Z #48 0.628 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T10:37:38.6017489Z #48 0.628 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T10:37:38.6018286Z #48 0.629 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T10:37:38.6019110Z #48 0.629 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T10:37:38.6019892Z #48 0.629 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T10:37:38.6020661Z #48 0.629 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T10:37:38.6021438Z #48 0.629 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T10:37:38.6022276Z #48 0.629 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T10:37:38.6023060Z #48 0.629 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T10:37:38.6023914Z #48 0.629 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T10:37:38.6024686Z #48 0.629 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T10:37:38.6025462Z #48 0.629 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T10:37:38.6026245Z #48 0.629 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T10:37:38.6027009Z #48 0.629 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T10:37:38.6027746Z #48 0.629 DEBUG Found stale response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T10:37:38.6028476Z #48 0.629 DEBUG Sending revalidation request for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T10:37:38.6029183Z #48 0.629 DEBUG Found stale response for: https://pypi.org/simple/setuptools/ 2025-09-07T10:37:38.6029882Z #48 0.629 DEBUG Sending revalidation request for: https://pypi.org/simple/setuptools/ 2025-09-07T10:37:38.6030602Z #48 0.629 DEBUG Found not-modified response for: https://pypi.org/simple/filelock/ 2025-09-07T10:37:38.6031324Z #48 0.629 DEBUG Found not-modified response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T10:37:38.6032254Z #48 0.629 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T10:37:38.6033419Z #48 0.630 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T10:37:38.6034446Z #48 0.630 DEBUG Found not-modified response for: https://pypi.org/simple/sympy/ 2025-09-07T10:37:38.6035118Z #48 0.630 DEBUG Found not-modified response for: https://pypi.org/simple/jinja2/ 2025-09-07T10:37:38.6036013Z #48 0.630 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T10:37:38.6037031Z #48 0.630 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T10:37:38.6037935Z #48 0.630 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T10:37:38.6038769Z #48 0.631 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T10:37:38.6039642Z #48 0.631 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T10:37:38.6040391Z #48 0.631 DEBUG Found not-modified response for: https://pypi.org/simple/networkx/ 2025-09-07T10:37:38.6041146Z #48 0.631 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T10:37:38.6041930Z #48 0.631 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T10:37:38.6042715Z #48 0.631 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T10:37:38.6043484Z #48 0.631 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T10:37:38.6044212Z #48 0.631 DEBUG Found not-modified response for: https://pypi.org/simple/fsspec/ 2025-09-07T10:37:38.6045200Z #48 0.631 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.86, <12.9.86+) 2025-09-07T10:37:38.6046234Z #48 0.631 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T10:37:38.6047543Z #48 0.631 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.9.86, <12.9.86+ 2025-09-07T10:37:38.6049072Z #48 0.631 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T10:37:38.6050090Z #48 0.631 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.9.86: nvidia-cuda-nvrtc-cu12==12.9.86 2025-09-07T10:37:38.6051436Z #48 0.631 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.9.86: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.86 2025-09-07T10:37:38.6052550Z #48 0.631 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12 (==12.9.86) 2025-09-07T10:37:38.6053819Z #48 0.631 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:37:38.6055053Z #48 0.631 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T10:37:38.6055805Z #48 0.631 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T10:37:38.6056622Z #48 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T10:37:38.6057436Z #48 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T10:37:38.6058271Z #48 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T10:37:38.6059051Z #48 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/setuptools/ 2025-09-07T10:37:38.6059808Z #48 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T10:37:38.6060630Z #48 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T10:37:38.6061923Z #48 0.633 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:37:38.6063658Z #48 0.633 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.86) 2025-09-07T10:37:38.6065131Z #48 0.633 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:37:38.6066316Z #48 0.633 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.9.86 [installed] (installed) 2025-09-07T10:37:38.6067316Z #48 0.633 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T10:37:38.6068936Z #48 0.633 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T10:37:38.6070194Z #48 0.633 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T10:37:38.6071068Z #48 0.633 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.9.79: nvidia-cuda-runtime-cu12==12.9.79 2025-09-07T10:37:38.6072325Z #48 0.633 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.9.79: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T10:37:38.6073449Z #48 0.633 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12 (==12.9.79) 2025-09-07T10:37:38.6074708Z #48 0.633 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:37:38.6075932Z #48 0.633 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T10:37:38.6076670Z #48 0.633 DEBUG Found not-modified response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T10:37:38.6077458Z #48 0.633 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T10:37:38.6078425Z #48 0.633 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T10:37:38.6079515Z #48 0.634 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T10:37:38.6080929Z #48 0.634 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:37:38.6082431Z #48 0.634 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T10:37:38.6083933Z #48 0.634 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:37:38.6085148Z #48 0.634 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.9.79 [installed] (installed) 2025-09-07T10:37:38.6086150Z #48 0.634 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T10:37:38.6087592Z #48 0.634 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T10:37:38.6088719Z #48 0.634 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T10:37:38.6089551Z #48 0.634 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.9.79: nvidia-cuda-cupti-cu12==12.9.79 2025-09-07T10:37:38.6090759Z #48 0.634 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.9.79: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T10:37:38.6092185Z #48 0.634 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12 (==12.9.79) 2025-09-07T10:37:38.6093399Z #48 0.634 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:37:38.6094523Z #48 0.634 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T10:37:38.6095667Z #48 0.635 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:37:38.6097121Z #48 0.635 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T10:37:38.6098527Z #48 0.635 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:37:38.6099667Z #48 0.635 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.9.79 [installed] (installed) 2025-09-07T10:37:38.6100659Z #48 0.635 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=9.10.2.21, <9.10.2.21+) 2025-09-07T10:37:38.6102106Z #48 0.635 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=9.10.2.21, <9.10.2.21+ 2025-09-07T10:37:38.6103357Z #48 0.635 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T10:37:38.6104129Z #48 0.635 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12==9.10.2.21 2025-09-07T10:37:38.6105499Z #48 0.635 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==9.10.2.21 2025-09-07T10:37:38.6107300Z #48 0.635 DEBUG Searching for a compatible version of nvidia-cudnn-cu12 (==9.10.2.21) 2025-09-07T10:37:38.6109266Z #48 0.635 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T10:37:38.6110684Z #48 0.635 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T10:37:38.6111729Z #48 0.635 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T10:37:38.6112881Z #48 0.635 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T10:37:38.6113891Z #48 0.635 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==9.10.2.21) 2025-09-07T10:37:38.6115209Z #48 0.635 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T10:37:38.6116262Z #48 0.635 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T10:37:38.6117278Z #48 0.635 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies * 2025-09-07T10:37:38.6118371Z #48 0.635 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T10:37:38.6119427Z #48 0.635 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.1.4, <12.9.1.4+) 2025-09-07T10:37:38.6120820Z #48 0.635 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=12.9.1.4, <12.9.1.4+ 2025-09-07T10:37:38.6122002Z #48 0.635 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T10:37:38.6122824Z #48 0.635 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.9.1.4: nvidia-cublas-cu12==12.9.1.4 2025-09-07T10:37:38.6123962Z #48 0.635 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.9.1.4: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.1.4 2025-09-07T10:37:38.6125001Z #48 0.635 DEBUG Searching for a compatible version of nvidia-cublas-cu12 (==12.9.1.4) 2025-09-07T10:37:38.6126078Z #48 0.635 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T10:37:38.6127584Z #48 0.635 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T10:37:38.6128640Z #48 0.635 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T10:37:38.6129555Z #48 0.635 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.1.4) 2025-09-07T10:37:38.6130968Z #48 0.635 DEBUG Found installed version of nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.9.1.4 2025-09-07T10:37:38.6132227Z #48 0.635 DEBUG Selecting: nvidia-cublas-cu12==12.9.1.4 [installed] (installed) 2025-09-07T10:37:38.6133199Z #48 0.635 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.4.1.4, <11.4.1.4+) 2025-09-07T10:37:38.6134734Z #48 0.635 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=11.4.1.4, <11.4.1.4+ 2025-09-07T10:37:38.6135947Z #48 0.635 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T10:37:38.6136726Z #48 0.635 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-cufft-cu12==11.4.1.4 2025-09-07T10:37:38.6137929Z #48 0.635 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.4.1.4 2025-09-07T10:37:38.6138969Z #48 0.635 DEBUG Searching for a compatible version of nvidia-cufft-cu12 (==11.4.1.4) 2025-09-07T10:37:38.6140173Z #48 0.635 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T10:37:38.6141858Z #48 0.635 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T10:37:38.6143153Z #48 0.635 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T10:37:38.6143903Z #48 0.635 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-nvjitlink-cu12* 2025-09-07T10:37:38.6144914Z #48 0.635 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.4.1.4) 2025-09-07T10:37:38.6146316Z #48 0.635 DEBUG Found installed version of nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.4.1.4 2025-09-07T10:37:38.6147446Z #48 0.635 DEBUG Selecting: nvidia-cufft-cu12==11.4.1.4 [installed] (installed) 2025-09-07T10:37:38.6148582Z #48 0.635 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies * 2025-09-07T10:37:38.6150271Z #48 0.635 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.4.1.4: nvidia-nvjitlink-cu12* 2025-09-07T10:37:38.6151463Z #48 0.635 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=10.3.10.19, <10.3.10.19+) 2025-09-07T10:37:38.6153008Z #48 0.635 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=10.3.10.19, <10.3.10.19+ 2025-09-07T10:37:38.6154174Z #48 0.635 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T10:37:38.6154995Z #48 0.635 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.10.19: nvidia-curand-cu12==10.3.10.19 2025-09-07T10:37:38.6156262Z #48 0.635 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.10.19: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==10.3.10.19 2025-09-07T10:37:38.6157355Z #48 0.635 DEBUG Searching for a compatible version of nvidia-curand-cu12 (==10.3.10.19) 2025-09-07T10:37:38.6158491Z #48 0.635 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T10:37:38.6160029Z #48 0.635 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T10:37:38.6161140Z #48 0.635 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T10:37:38.6162184Z #48 0.636 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==10.3.10.19) 2025-09-07T10:37:38.6163526Z #48 0.636 DEBUG Found installed version of nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.10.19 2025-09-07T10:37:38.6164588Z #48 0.636 DEBUG Selecting: nvidia-curand-cu12==10.3.10.19 [installed] (installed) 2025-09-07T10:37:38.6165575Z #48 0.636 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.7.5.82, <11.7.5.82+) 2025-09-07T10:37:38.6167085Z #48 0.636 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=11.7.5.82, <11.7.5.82+ 2025-09-07T10:37:38.6168221Z #48 0.636 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T10:37:38.6169051Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusolver-cu12==11.7.5.82 2025-09-07T10:37:38.6170269Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.7.5.82 2025-09-07T10:37:38.6171586Z #48 0.636 DEBUG Searching for a compatible version of nvidia-cusolver-cu12 (==11.7.5.82) 2025-09-07T10:37:38.6172768Z #48 0.636 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T10:37:38.6174340Z #48 0.636 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T10:37:38.6175482Z #48 0.636 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T10:37:38.6176287Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cublas-cu12* 2025-09-07T10:37:38.6177190Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-nvjitlink-cu12* 2025-09-07T10:37:38.6178125Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusparse-cu12* 2025-09-07T10:37:38.6179212Z #48 0.636 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.7.5.82) 2025-09-07T10:37:38.6180685Z #48 0.636 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.5.82 2025-09-07T10:37:38.6181823Z #48 0.636 DEBUG Selecting: nvidia-cusolver-cu12==11.7.5.82 [installed] (installed) 2025-09-07T10:37:38.6183022Z #48 0.636 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies * 2025-09-07T10:37:38.6184391Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cublas-cu12* 2025-09-07T10:37:38.6185305Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-nvjitlink-cu12* 2025-09-07T10:37:38.6186182Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.5.82: nvidia-cusparse-cu12* 2025-09-07T10:37:38.6187284Z #48 0.636 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.5.10.65, <12.5.10.65+) 2025-09-07T10:37:38.6188857Z #48 0.636 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.5.10.65, <12.5.10.65+ 2025-09-07T10:37:38.6190122Z #48 0.636 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T10:37:38.6190968Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-cusparse-cu12==12.5.10.65 2025-09-07T10:37:38.6192184Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.5.10.65 2025-09-07T10:37:38.6193278Z #48 0.636 DEBUG Searching for a compatible version of nvidia-cusparse-cu12 (==12.5.10.65) 2025-09-07T10:37:38.6194559Z #48 0.636 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T10:37:38.6195763Z #48 0.636 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T10:37:38.6196975Z #48 0.636 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T10:37:38.6198296Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-nvjitlink-cu12* 2025-09-07T10:37:38.6199350Z #48 0.636 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.5.10.65) 2025-09-07T10:37:38.6200828Z #48 0.636 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.10.65 2025-09-07T10:37:38.6202026Z #48 0.636 DEBUG Selecting: nvidia-cusparse-cu12==12.5.10.65 [installed] (installed) 2025-09-07T10:37:38.6202830Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.10.65: nvidia-nvjitlink-cu12* 2025-09-07T10:37:38.6203924Z #48 0.636 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=0.7.1, <0.7.1+) 2025-09-07T10:37:38.6205347Z #48 0.636 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies >=0.7.1, <0.7.1+ 2025-09-07T10:37:38.6206479Z #48 0.636 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T10:37:38.6207327Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12==0.7.1 2025-09-07T10:37:38.6208579Z #48 0.636 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==0.7.1 2025-09-07T10:37:38.6209697Z #48 0.636 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12 (==0.7.1) 2025-09-07T10:37:38.6210803Z #48 0.636 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T10:37:38.6212597Z #48 0.636 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T10:37:38.6213756Z #48 0.636 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T10:37:38.6214720Z #48 0.636 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==0.7.1) 2025-09-07T10:37:38.6216128Z #48 0.636 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T10:37:38.6217245Z #48 0.636 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T10:37:38.6218205Z #48 0.636 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=2.27.5, <2.27.5+) 2025-09-07T10:37:38.6219673Z #48 0.636 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=2.27.5, <2.27.5+ 2025-09-07T10:37:38.6220827Z #48 0.636 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T10:37:38.6221581Z #48 0.636 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12==2.27.5 2025-09-07T10:37:38.6223031Z #48 0.636 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==2.27.5 2025-09-07T10:37:38.6224048Z #48 0.636 DEBUG Searching for a compatible version of nvidia-nccl-cu12 (==2.27.5) 2025-09-07T10:37:38.6225176Z #48 0.636 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T10:37:38.6226257Z #48 0.636 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T10:37:38.6227350Z #48 0.636 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T10:37:38.6228714Z #48 0.636 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==2.27.5) 2025-09-07T10:37:38.6230315Z #48 0.636 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T10:37:38.6231418Z #48 0.636 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T10:37:38.6232355Z #48 0.636 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=3.3.20, <3.3.20+) 2025-09-07T10:37:38.6233829Z #48 0.636 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=3.3.20, <3.3.20+ 2025-09-07T10:37:38.6235014Z #48 0.636 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T10:37:38.6235776Z #48 0.636 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12==3.3.20 2025-09-07T10:37:38.6236926Z #48 0.636 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.3.20 2025-09-07T10:37:38.6238018Z #48 0.636 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12 (==3.3.20) 2025-09-07T10:37:38.6239260Z #48 0.636 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T10:37:38.6240409Z #48 0.636 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T10:37:38.6241563Z #48 0.636 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T10:37:38.6243001Z #48 0.636 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.3.20) 2025-09-07T10:37:38.6244424Z #48 0.636 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T10:37:38.6245559Z #48 0.636 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T10:37:38.6246499Z #48 0.636 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.79, <12.9.79+) 2025-09-07T10:37:38.6247927Z #48 0.636 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies >=12.9.79, <12.9.79+ 2025-09-07T10:37:38.6249392Z #48 0.636 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T10:37:38.6250276Z #48 0.637 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.9.79: nvidia-nvtx-cu12==12.9.79 2025-09-07T10:37:38.6251479Z #48 0.637 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.9.79: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.79 2025-09-07T10:37:38.6252498Z #48 0.637 DEBUG Searching for a compatible version of nvidia-nvtx-cu12 (==12.9.79) 2025-09-07T10:37:38.6253730Z #48 0.637 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:37:38.6254843Z #48 0.637 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T10:37:38.6255973Z #48 0.637 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:37:38.6257368Z #48 0.637 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.79) 2025-09-07T10:37:38.6258746Z #48 0.637 DEBUG Found installed version of nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) that satisfies ==12.9.79 2025-09-07T10:37:38.6259868Z #48 0.637 DEBUG Selecting: nvidia-nvtx-cu12==12.9.79 [installed] (installed) 2025-09-07T10:37:38.6260853Z #48 0.637 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.9.86, <12.9.86+) 2025-09-07T10:37:38.6262421Z #48 0.637 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.9.86, <12.9.86+ 2025-09-07T10:37:38.6263764Z #48 0.637 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T10:37:38.6264581Z #48 0.637 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.9.86: nvidia-nvjitlink-cu12==12.9.86 2025-09-07T10:37:38.6265793Z #48 0.637 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.9.86: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.9.86 2025-09-07T10:37:38.6266945Z #48 0.637 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12 (==12.9.86) 2025-09-07T10:37:38.6268197Z #48 0.637 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:37:38.6269387Z #48 0.637 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T10:37:38.6270580Z #48 0.637 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:37:38.6272074Z #48 0.637 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.9.86) 2025-09-07T10:37:38.6273534Z #48 0.637 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.9.86 2025-09-07T10:37:38.6274713Z #48 0.637 DEBUG Selecting: nvidia-nvjitlink-cu12==12.9.86 [installed] (installed) 2025-09-07T10:37:38.6275785Z #48 0.637 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=1.14.1.1, <1.14.1.1+) 2025-09-07T10:37:38.6277757Z #48 0.637 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=1.14.1.1, <1.14.1.1+ 2025-09-07T10:37:38.6279024Z #48 0.637 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T10:37:38.6279811Z #48 0.637 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.14.1.1: nvidia-cufile-cu12==1.14.1.1 2025-09-07T10:37:38.6280955Z #48 0.637 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.14.1.1: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==1.14.1.1 2025-09-07T10:37:38.6281980Z #48 0.637 DEBUG Searching for a compatible version of nvidia-cufile-cu12 (==1.14.1.1) 2025-09-07T10:37:38.6283207Z #48 0.637 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T10:37:38.6284846Z #48 0.637 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T10:37:38.6285992Z #48 0.637 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T10:37:38.6286918Z #48 0.637 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==1.14.1.1) 2025-09-07T10:37:38.6288324Z #48 0.637 DEBUG Found installed version of nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.14.1.1 2025-09-07T10:37:38.6289486Z #48 0.637 DEBUG Selecting: nvidia-cufile-cu12==1.14.1.1 [installed] (installed) 2025-09-07T10:37:38.6290296Z #48 0.637 DEBUG Searching for a compatible version of pytorch-triton{sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T10:37:38.6292020Z #48 0.637 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:37:38.6293372Z #48 0.637 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T10:37:38.6294241Z #48 0.637 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton==3.4.0+gitf7888497 2025-09-07T10:37:38.6295386Z #48 0.637 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton{sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T10:37:38.6296439Z #48 0.637 DEBUG Searching for a compatible version of pytorch-triton (==3.4.0+gitf7888497) 2025-09-07T10:37:38.6297838Z #48 0.637 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:37:38.6299185Z #48 0.637 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T10:37:38.6300530Z #48 0.637 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:37:38.6301981Z #48 0.637 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T10:37:38.6303083Z #48 0.637 DEBUG Searching for a compatible version of pytorch-triton{sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T10:37:38.6304515Z #48 0.637 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T10:37:38.6305812Z #48 0.637 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T10:37:38.6306591Z #48 0.637 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T10:37:38.6307276Z #48 0.637 DEBUG Searching for a compatible version of ninja (*) 2025-09-07T10:37:38.6307834Z #48 0.637 DEBUG Found installed version of ninja==1.13.0 that satisfies * 2025-09-07T10:37:38.6308377Z #48 0.637 DEBUG Selecting: ninja==1.13.0 [installed] (installed) 2025-09-07T10:37:38.6309221Z #48 0.637 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies >=40.8.0 2025-09-07T10:37:38.6310086Z #48 0.637 DEBUG Searching for a compatible version of requests (*) 2025-09-07T10:37:38.6310656Z #48 0.637 DEBUG Found installed version of requests==2.32.5 that satisfies * 2025-09-07T10:37:38.6311269Z #48 0.637 DEBUG Selecting: requests==2.32.5 [installed] (installed) 2025-09-07T10:37:38.6311913Z #48 0.637 DEBUG Adding transitive dependency for requests==2.32.5: charset-normalizer>=2, <4 2025-09-07T10:37:38.6312628Z #48 0.637 DEBUG Adding transitive dependency for requests==2.32.5: idna>=2.5, <4 2025-09-07T10:37:38.6313304Z #48 0.637 DEBUG Adding transitive dependency for requests==2.32.5: urllib3>=1.21.1, <3 2025-09-07T10:37:38.6313994Z #48 0.637 DEBUG Adding transitive dependency for requests==2.32.5: certifi>=2017.4.17 2025-09-07T10:37:38.6314662Z #48 0.637 DEBUG Searching for a compatible version of cuda-python (<=12.9+) 2025-09-07T10:37:38.6315351Z #48 0.637 DEBUG Selecting: cuda-python==12.9.0 [compatible] (cuda_python-12.9.0-py3-none-any.whl) 2025-09-07T10:37:38.6316180Z #48 0.637 DEBUG Adding transitive dependency for cuda-python==12.9.0: cuda-bindings>=12.9.0, <12.10.dev0 2025-09-07T10:37:38.6316867Z #48 0.637 DEBUG Searching for a compatible version of pynvml (*) 2025-09-07T10:37:38.6317484Z #48 0.637 DEBUG Selecting: pynvml==13.0.1 [compatible] (pynvml-13.0.1-py3-none-any.whl) 2025-09-07T10:37:38.6318190Z #48 0.637 DEBUG Adding transitive dependency for pynvml==13.0.1: nvidia-ml-py>=12.0.0 2025-09-07T10:37:38.6318803Z #48 0.637 DEBUG Searching for a compatible version of einops (*) 2025-09-07T10:37:38.6319362Z #48 0.637 DEBUG Found installed version of einops==0.8.1 that satisfies * 2025-09-07T10:37:38.6319906Z #48 0.637 DEBUG Selecting: einops==0.8.1 [installed] (installed) 2025-09-07T10:37:38.6320465Z #48 0.637 DEBUG Searching for a compatible version of packaging (>=24.2) 2025-09-07T10:37:38.6321090Z #48 0.637 DEBUG Found installed version of packaging==25.0 that satisfies >=24.2 2025-09-07T10:37:38.6321677Z #48 0.637 DEBUG Selecting: packaging==25.0 [installed] (installed) 2025-09-07T10:37:38.6322328Z #48 0.637 DEBUG Searching for a compatible version of nvidia-cudnn-frontend (>=1.13.0) 2025-09-07T10:37:38.6323394Z #48 0.637 DEBUG Selecting: nvidia-cudnn-frontend==1.14.1 [compatible] (nvidia_cudnn_frontend-1.14.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:37:38.6324384Z #48 0.637 DEBUG Found stale response for: https://pypi.org/simple/idna/ 2025-09-07T10:37:38.6325011Z #48 0.637 DEBUG Sending revalidation request for: https://pypi.org/simple/idna/ 2025-09-07T10:37:38.6325652Z #48 0.637 DEBUG No cache entry for: https://pypi.org/simple/nvidia-ml-py/ 2025-09-07T10:37:38.6326251Z #48 0.638 DEBUG No cache entry for: https://pypi.org/simple/cuda-bindings/ 2025-09-07T10:37:38.6326877Z #48 0.638 DEBUG Found stale response for: https://pypi.org/simple/certifi/ 2025-09-07T10:37:38.6327538Z #48 0.638 DEBUG Sending revalidation request for: https://pypi.org/simple/certifi/ 2025-09-07T10:37:38.6328185Z #48 0.638 DEBUG Found stale response for: https://pypi.org/simple/urllib3/ 2025-09-07T10:37:38.6328845Z #48 0.638 DEBUG Sending revalidation request for: https://pypi.org/simple/urllib3/ 2025-09-07T10:37:38.6329578Z #48 0.638 DEBUG Found stale response for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T10:37:38.6330331Z #48 0.638 DEBUG Sending revalidation request for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T10:37:38.6331159Z #48 0.639 DEBUG Found not-modified response for: https://pypi.org/simple/certifi/ 2025-09-07T10:37:38.6332053Z #48 0.639 DEBUG Found not-modified response for: https://pypi.org/simple/idna/ 2025-09-07T10:37:38.6332762Z #48 0.640 DEBUG Found installed version of certifi==2025.8.3 that satisfies >=2017.4.17 2025-09-07T10:37:38.6333474Z #48 0.640 DEBUG Found not-modified response for: https://pypi.org/simple/urllib3/ 2025-09-07T10:37:38.6334237Z #48 0.640 DEBUG Found not-modified response for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T10:37:38.6334973Z #48 0.641 DEBUG Found installed version of idna==3.10 that satisfies >=2.5, <4 2025-09-07T10:37:38.6335648Z #48 0.641 DEBUG Found installed version of urllib3==2.5.0 that satisfies >=1.21.1, <3 2025-09-07T10:37:38.6337236Z #48 0.641 DEBUG No cache entry for: https://files.pythonhosted.org/packages/26/15/3dbe02186dc0daaa8410aa1c1c368d36967b88035ce1cea663e9ba11312a/cuda_bindings-12.9.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T10:37:38.6339424Z #48 0.642 DEBUG No cache entry for: https://files.pythonhosted.org/packages/f9/96/88a5cb161c61cab2ee65b5aa61e612901fbcb1660024f0ccb26fcb02a17c/nvidia_ml_py-13.580.65-py3-none-any.whl.metadata 2025-09-07T10:37:38.6340786Z #48 0.642 DEBUG Found installed version of charset-normalizer==3.4.3 that satisfies >=2, <4 2025-09-07T10:37:38.6341475Z #48 0.642 DEBUG Searching for a compatible version of filelock (*) 2025-09-07T10:37:38.6342291Z #48 0.642 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T10:37:38.6343122Z #48 0.642 DEBUG Selecting: filelock==3.19.1 [installed] (installed) 2025-09-07T10:37:38.6343843Z #48 0.642 DEBUG Searching for a compatible version of typing-extensions (>=4.10.0) 2025-09-07T10:37:38.6344851Z #48 0.642 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T10:37:38.6345817Z #48 0.642 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T10:37:38.6346532Z #48 0.642 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (*) 2025-09-07T10:37:38.6347496Z #48 0.642 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies * 2025-09-07T10:37:38.6348333Z #48 0.642 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T10:37:38.6349311Z #48 0.642 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools==78.1.0 2025-09-07T10:37:38.6350323Z #48 0.642 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools{python_full_version >= '3.12'}==78.1.0 2025-09-07T10:37:38.6351140Z #48 0.642 DEBUG Searching for a compatible version of setuptools (==78.1.0) 2025-09-07T10:37:38.6352128Z #48 0.642 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T10:37:38.6353007Z #48 0.642 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T10:37:38.6353894Z #48 0.642 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T10:37:38.6354947Z #48 0.642 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (==78.1.0) 2025-09-07T10:37:38.6356033Z #48 0.642 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T10:37:38.6356918Z #48 0.642 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T10:37:38.6357492Z #48 0.642 DEBUG Searching for a compatible version of sympy (>=1.13.3) 2025-09-07T10:37:38.6358332Z #48 0.642 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T10:37:38.6359138Z #48 0.642 DEBUG Selecting: sympy==1.14.0 [installed] (installed) 2025-09-07T10:37:38.6359743Z #48 0.642 DEBUG Adding transitive dependency for sympy==1.14.0: mpmath>=1.1.0, <1.4 2025-09-07T10:37:38.6360406Z #48 0.642 DEBUG Searching for a compatible version of networkx (>=2.5.1) 2025-09-07T10:37:38.6361357Z #48 0.642 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T10:37:38.6362152Z #48 0.642 DEBUG Selecting: networkx==3.5 [installed] (installed) 2025-09-07T10:37:38.6362678Z #48 0.642 DEBUG Searching for a compatible version of jinja2 (*) 2025-09-07T10:37:38.6363425Z #48 0.642 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T10:37:38.6364183Z #48 0.642 DEBUG Selecting: jinja2==3.1.6 [installed] (installed) 2025-09-07T10:37:38.6364811Z #48 0.642 DEBUG Adding transitive dependency for jinja2==3.1.6: markupsafe>=2.0 2025-09-07T10:37:38.6365434Z #48 0.642 DEBUG Searching for a compatible version of fsspec (>=0.8.5) 2025-09-07T10:37:38.6366363Z #48 0.642 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T10:37:38.6367185Z #48 0.642 DEBUG Selecting: fsspec==2025.7.0 [installed] (installed) 2025-09-07T10:37:38.6367791Z #48 0.643 DEBUG Searching for a compatible version of charset-normalizer (>=2, <4) 2025-09-07T10:37:38.6368503Z #48 0.643 DEBUG Found installed version of charset-normalizer==3.4.3 that satisfies >=2, <4 2025-09-07T10:37:38.6369201Z #48 0.643 DEBUG Selecting: charset-normalizer==3.4.3 [installed] (installed) 2025-09-07T10:37:38.6369790Z #48 0.643 DEBUG Searching for a compatible version of idna (>=2.5, <4) 2025-09-07T10:37:38.6370389Z #48 0.643 DEBUG Found installed version of idna==3.10 that satisfies >=2.5, <4 2025-09-07T10:37:38.6371034Z #48 0.643 DEBUG Selecting: idna==3.10 [installed] (installed) 2025-09-07T10:37:38.6371752Z #48 0.643 DEBUG Searching for a compatible version of urllib3 (>=1.21.1, <3) 2025-09-07T10:37:38.6372428Z #48 0.643 DEBUG Found installed version of urllib3==2.5.0 that satisfies >=1.21.1, <3 2025-09-07T10:37:38.6373049Z #48 0.643 DEBUG Selecting: urllib3==2.5.0 [installed] (installed) 2025-09-07T10:37:38.6373633Z #48 0.643 DEBUG Searching for a compatible version of certifi (>=2017.4.17) 2025-09-07T10:37:38.6374299Z #48 0.643 DEBUG Found installed version of certifi==2025.8.3 that satisfies >=2017.4.17 2025-09-07T10:37:38.6374945Z #48 0.643 DEBUG Selecting: certifi==2025.8.3 [installed] (installed) 2025-09-07T10:37:38.6375604Z #48 0.643 DEBUG Searching for a compatible version of cuda-bindings (>=12.9.0, <12.10.dev0) 2025-09-07T10:37:38.6376642Z #48 0.643 DEBUG Selecting: cuda-bindings==12.9.2 [compatible] (cuda_bindings-12.9.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:37:38.6377606Z #48 0.643 DEBUG Found stale response for: https://pypi.org/simple/mpmath/ 2025-09-07T10:37:38.6378277Z #48 0.643 DEBUG Sending revalidation request for: https://pypi.org/simple/mpmath/ 2025-09-07T10:37:38.6378977Z #48 0.643 DEBUG Found stale response for: https://pypi.org/simple/markupsafe/ 2025-09-07T10:37:38.6379693Z #48 0.643 DEBUG Sending revalidation request for: https://pypi.org/simple/markupsafe/ 2025-09-07T10:37:38.6380412Z #48 0.644 DEBUG Found not-modified response for: https://pypi.org/simple/mpmath/ 2025-09-07T10:37:38.6381237Z #48 0.644 DEBUG Adding transitive dependency for cuda-bindings==12.9.2: cuda-pathfinder>=1.1, <2.dev0 2025-09-07T10:37:38.6381994Z #48 0.644 DEBUG Searching for a compatible version of nvidia-ml-py (>=12.0.0) 2025-09-07T10:37:38.6382755Z #48 0.644 DEBUG Selecting: nvidia-ml-py==13.580.65 [compatible] (nvidia_ml_py-13.580.65-py3-none-any.whl) 2025-09-07T10:37:38.6383880Z #48 0.644 DEBUG Searching for a compatible version of mpmath (>=1.1.0, <1.4) 2025-09-07T10:37:38.6384772Z #48 0.644 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T10:37:38.6385866Z #48 0.644 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T10:37:38.6386664Z #48 0.644 DEBUG Selecting: mpmath==1.3.0 [installed] (installed) 2025-09-07T10:37:38.6387243Z #48 0.644 DEBUG No cache entry for: https://pypi.org/simple/cuda-pathfinder/ 2025-09-07T10:37:38.6387914Z #48 0.644 DEBUG Found not-modified response for: https://pypi.org/simple/markupsafe/ 2025-09-07T10:37:38.6388580Z #48 0.645 DEBUG Searching for a compatible version of markupsafe (>=2.0) 2025-09-07T10:37:38.6389625Z #48 0.645 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T10:37:38.6391154Z #48 0.645 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T10:37:38.6392168Z #48 0.645 DEBUG Selecting: markupsafe==3.0.2 [installed] (installed) 2025-09-07T10:37:38.6392795Z #48 0.646 DEBUG Searching for a compatible version of cuda-pathfinder (>=1.1, <2.dev0) 2025-09-07T10:37:38.6393574Z #48 0.646 DEBUG Selecting: cuda-pathfinder==1.2.1 [compatible] (cuda_pathfinder-1.2.1-py3-none-any.whl) 2025-09-07T10:37:38.6394949Z #48 0.646 DEBUG No cache entry for: https://files.pythonhosted.org/packages/22/54/6231878f6fc490f222c87190ce12196b67b7700b30818882a87f478e4944/cuda_pathfinder-1.2.1-py3-none-any.whl.metadata 2025-09-07T10:37:38.6399104Z #48 0.648 DEBUG Tried 42 versions: certifi 1, charset-normalizer 1, cuda-bindings 1, cuda-pathfinder 1, cuda-python 1, einops 1, filelock 1, flashinfer-python 1, fsspec 1, idna 1, jinja2 1, markupsafe 1, mpmath 1, networkx 1, ninja 1, numpy 1, nvidia-cublas-cu12 1, nvidia-cuda-cupti-cu12 1, nvidia-cuda-nvrtc-cu12 1, nvidia-cuda-runtime-cu12 1, nvidia-cudnn-cu12 1, nvidia-cudnn-frontend 1, nvidia-cufft-cu12 1, nvidia-cufile-cu12 1, nvidia-curand-cu12 1, nvidia-cusolver-cu12 1, nvidia-cusparse-cu12 1, nvidia-cusparselt-cu12 1, nvidia-ml-py 1, nvidia-nccl-cu12 1, nvidia-nvjitlink-cu12 1, nvidia-nvshmem-cu12 1, nvidia-nvtx-cu12 1, packaging 1, pynvml 1, pytorch-triton 1, requests 1, setuptools 1, sympy 1, torch 1, typing-extensions 1, urllib3 1 2025-09-07T10:37:38.6402542Z #48 0.648 DEBUG marker environment resolution took 0.040s 2025-09-07T10:37:38.6402955Z #48 0.648 Resolved 42 packages in 42ms 2025-09-07T10:37:38.6403794Z #48 0.649 DEBUG Requirement already installed: nvidia-cublas-cu12==12.9.1.4 (from file:///dist/nvidia_cublas_cu12-12.9.1.4-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:37:38.6405269Z #48 0.649 DEBUG Identified uncached distribution: flashinfer-python @ file:///workspace/wheels/flashinfer/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl 2025-09-07T10:37:38.6406805Z #48 0.649 DEBUG Requirement already installed: nvidia-cufile-cu12==1.14.1.1 (from file:///dist/nvidia_cufile_cu12-1.14.1.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:37:38.6408238Z #48 0.649 DEBUG Requirement already installed: nvidia-cusolver-cu12==11.7.5.82 (from file:///dist/nvidia_cusolver_cu12-11.7.5.82-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:37:38.6409766Z #48 0.649 DEBUG Requirement already installed: pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:37:38.6411004Z #48 0.649 DEBUG Requirement already installed: urllib3==2.5.0 2025-09-07T10:37:38.6412112Z #48 0.649 DEBUG Requirement already installed: nvidia-curand-cu12==10.3.10.19 (from file:///dist/nvidia_curand_cu12-10.3.10.19-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:37:38.6413059Z #48 0.649 DEBUG Requirement already installed: packaging==25.0 2025-09-07T10:37:38.6413609Z #48 0.649 DEBUG Identified uncached distribution: pynvml==13.0.1 2025-09-07T10:37:38.6414581Z #48 0.649 DEBUG Requirement already installed: nvidia-cuda-cupti-cu12==12.9.79 (from file:///dist/nvidia_cuda_cupti_cu12-12.9.79-py3-none-manylinux_2_25_x86_64.whl) 2025-09-07T10:37:38.6416012Z #48 0.649 DEBUG Requirement already installed: nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:37:38.6417254Z #48 0.649 DEBUG Requirement already installed: setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T10:37:38.6418464Z #48 0.649 DEBUG Requirement already installed: markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T10:37:38.6419420Z #48 0.649 DEBUG Requirement already installed: einops==0.8.1 2025-09-07T10:37:38.6419919Z #48 0.649 DEBUG Requirement already installed: ninja==1.13.0 2025-09-07T10:37:38.6420418Z #48 0.649 DEBUG Requirement already installed: numpy==2.2.6 2025-09-07T10:37:38.6421531Z #48 0.649 DEBUG Requirement already installed: nvidia-cuda-runtime-cu12==12.9.79 (from file:///dist/nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:37:38.6422615Z #48 0.649 DEBUG Requirement already installed: requests==2.32.5 2025-09-07T10:37:38.6423793Z #48 0.649 DEBUG Requirement already installed: nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:37:38.6425108Z #48 0.649 DEBUG Requirement already installed: typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T10:37:38.6426124Z #48 0.649 DEBUG Requirement already installed: mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T10:37:38.6427038Z #48 0.649 DEBUG Requirement already installed: sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T10:37:38.6428176Z #48 0.649 DEBUG Requirement already installed: nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T10:37:38.6429648Z #48 0.649 DEBUG Requirement already installed: nvidia-cusparse-cu12==12.5.10.65 (from file:///dist/nvidia_cusparse_cu12-12.5.10.65-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:37:38.6430669Z #48 0.649 DEBUG Requirement already installed: idna==3.10 2025-09-07T10:37:38.6431622Z #48 0.649 DEBUG Requirement already installed: nvidia-cufft-cu12==11.4.1.4 (from file:///dist/nvidia_cufft_cu12-11.4.1.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T10:37:38.6432618Z #48 0.649 DEBUG Requirement already installed: certifi==2025.8.3 2025-09-07T10:37:38.6433164Z #48 0.649 DEBUG Identified uncached distribution: cuda-python==12.9.0 2025-09-07T10:37:38.6434268Z #48 0.649 DEBUG Requirement already installed: nvidia-cuda-nvrtc-cu12==12.9.86 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T10:37:38.6435729Z #48 0.649 DEBUG Requirement already installed: nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T10:37:38.6436824Z #48 0.649 DEBUG Requirement already installed: jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T10:37:38.6437586Z #48 0.649 DEBUG Identified uncached distribution: cuda-bindings==12.9.2 2025-09-07T10:37:38.6438188Z #48 0.649 DEBUG Identified uncached distribution: cuda-pathfinder==1.2.1 2025-09-07T10:37:38.6439010Z #48 0.649 DEBUG Requirement already installed: filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T10:37:38.6440193Z #48 0.649 DEBUG Requirement already installed: torch==2.9.0.dev20250901+cu129 (from file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:37:38.6441189Z #48 0.649 DEBUG Identified uncached distribution: nvidia-ml-py==13.580.65 2025-09-07T10:37:38.6441973Z #48 0.649 DEBUG Requirement already installed: networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T10:37:38.6442906Z #48 0.649 DEBUG Requirement already installed: fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T10:37:38.6444079Z #48 0.649 DEBUG Requirement already installed: nvidia-nvtx-cu12==12.9.79 (from file:///dist/nvidia_nvtx_cu12-12.9.79-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl) 2025-09-07T10:37:38.6445103Z #48 0.649 DEBUG Identified uncached distribution: nvidia-cudnn-frontend==1.14.1 2025-09-07T10:37:38.6445739Z #48 0.649 DEBUG Requirement already installed: charset-normalizer==3.4.3 2025-09-07T10:37:38.6446817Z #48 0.649 DEBUG Requirement already installed: nvidia-nvjitlink-cu12==12.9.86 (from file:///dist/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T10:37:38.6447813Z #48 0.649 DEBUG Unnecessary package: pyyaml==6.0.2 2025-09-07T10:37:38.6448267Z #48 0.649 DEBUG Unnecessary package: aiohappyeyeballs==2.6.1 2025-09-07T10:37:38.6449037Z #48 0.649 DEBUG Unnecessary package: aiohttp==3.12.15 2025-09-07T10:37:38.6449655Z #48 0.649 DEBUG Unnecessary package: aiosignal==1.4.0 2025-09-07T10:37:38.6450136Z #48 0.649 DEBUG Unnecessary package: annotated-types==0.7.0 2025-09-07T10:37:38.6450598Z #48 0.649 DEBUG Unnecessary package: anyio==4.10.0 2025-09-07T10:37:38.6451101Z #48 0.649 DEBUG Unnecessary package: astor==0.8.1 2025-09-07T10:37:38.6451518Z #48 0.649 DEBUG Unnecessary package: attrs==25.3.0 2025-09-07T10:37:38.6451954Z #48 0.649 DEBUG Unnecessary package: blake3==1.0.5 2025-09-07T10:37:38.6452389Z #48 0.649 DEBUG Unnecessary package: build==1.3.0 2025-09-07T10:37:38.6452822Z #48 0.649 DEBUG Unnecessary package: cachetools==6.2.0 2025-09-07T10:37:38.6453489Z #48 0.649 DEBUG Unnecessary package: cbor2==5.7.0 2025-09-07T10:37:38.6454224Z #48 0.649 DEBUG Unnecessary package: cffi==1.17.1 2025-09-07T10:37:38.6454978Z #48 0.649 DEBUG Unnecessary package: click==8.2.1 2025-09-07T10:37:38.6455551Z #48 0.649 DEBUG Unnecessary package: cloudpickle==3.1.1 2025-09-07T10:37:38.6456062Z #48 0.649 DEBUG Unnecessary package: compressed-tensors==0.11.0 2025-09-07T10:37:38.6456575Z #48 0.649 DEBUG Unnecessary package: cupy-cuda12x==13.6.0 2025-09-07T10:37:38.6457025Z #48 0.649 DEBUG Unnecessary package: depyf==0.19.0 2025-09-07T10:37:38.6457450Z #48 0.649 DEBUG Unnecessary package: dill==0.4.0 2025-09-07T10:37:38.6457875Z #48 0.649 DEBUG Unnecessary package: diskcache==5.6.3 2025-09-07T10:37:38.6458316Z #48 0.649 DEBUG Unnecessary package: distro==1.9.0 2025-09-07T10:37:38.6458745Z #48 0.649 DEBUG Unnecessary package: dnspython==2.7.0 2025-09-07T10:37:38.6459222Z #48 0.649 DEBUG Unnecessary package: email-validator==2.3.0 2025-09-07T10:37:38.6459684Z #48 0.649 DEBUG Unnecessary package: fastapi==0.116.1 2025-09-07T10:37:38.6460252Z #48 0.649 DEBUG Unnecessary package: fastapi-cli==0.0.10 2025-09-07T10:37:38.6460758Z #48 0.649 DEBUG Unnecessary package: fastapi-cloud-cli==0.1.5 2025-09-07T10:37:38.6461285Z #48 0.649 DEBUG Unnecessary package: fastrlock==0.8.3 2025-09-07T10:37:38.6461744Z #48 0.649 DEBUG Unnecessary package: frozendict==2.4.6 2025-09-07T10:37:38.6462194Z #48 0.649 DEBUG Unnecessary package: frozenlist==1.7.0 2025-09-07T10:37:38.6462634Z #48 0.649 DEBUG Unnecessary package: gguf==0.17.1 2025-09-07T10:37:38.6463150Z #48 0.649 DEBUG Unnecessary package: h11==0.16.0 2025-09-07T10:37:38.6463569Z #48 0.649 DEBUG Unnecessary package: hf-xet==1.1.9 2025-09-07T10:37:38.6464000Z #48 0.649 DEBUG Unnecessary package: httpcore==1.0.9 2025-09-07T10:37:38.6464489Z #48 0.649 DEBUG Unnecessary package: httptools==0.6.4 2025-09-07T10:37:38.6464920Z #48 0.649 DEBUG Unnecessary package: httpx==0.28.1 2025-09-07T10:37:38.6465368Z #48 0.649 DEBUG Unnecessary package: huggingface-hub==0.34.4 2025-09-07T10:37:38.6465851Z #48 0.649 DEBUG Unnecessary package: interegular==0.3.3 2025-09-07T10:37:38.6466273Z #48 0.649 DEBUG Unnecessary package: jiter==0.10.0 2025-09-07T10:37:38.6466711Z #48 0.649 DEBUG Unnecessary package: jsonschema==4.25.1 2025-09-07T10:37:38.6467243Z #48 0.649 DEBUG Unnecessary package: jsonschema-specifications==2025.4.1 2025-09-07T10:37:38.6467762Z #48 0.649 DEBUG Unnecessary package: lark==1.2.2 2025-09-07T10:37:38.6468193Z #48 0.649 DEBUG Unnecessary package: llguidance==0.7.30 2025-09-07T10:37:38.6468627Z #48 0.649 DEBUG Unnecessary package: llvmlite==0.44.0 2025-09-07T10:37:38.6469110Z #48 0.649 DEBUG Unnecessary package: lm-format-enforcer==0.11.3 2025-09-07T10:37:38.6469605Z #48 0.649 DEBUG Unnecessary package: markdown-it-py==4.0.0 2025-09-07T10:37:38.6470055Z #48 0.649 DEBUG Unnecessary package: mdurl==0.1.2 2025-09-07T10:37:38.6470488Z #48 0.649 DEBUG Unnecessary package: mistral-common==1.8.4 2025-09-07T10:37:38.6470939Z #48 0.649 DEBUG Unnecessary package: msgpack==1.1.1 2025-09-07T10:37:38.6471376Z #48 0.649 DEBUG Unnecessary package: msgspec==0.19.0 2025-09-07T10:37:38.6471800Z #48 0.649 DEBUG Unnecessary package: multidict==6.6.4 2025-09-07T10:37:38.6472229Z #48 0.649 DEBUG Unnecessary package: numba==0.61.2 2025-09-07T10:37:38.6472679Z #48 0.649 DEBUG Unnecessary package: openai==1.106.1 2025-09-07T10:37:38.6473135Z #48 0.649 DEBUG Unnecessary package: openai-harmony==0.0.4 2025-09-07T10:37:38.6473657Z #48 0.649 DEBUG Unnecessary package: opencv-python-headless==4.12.0.88 2025-09-07T10:37:38.6474179Z #48 0.649 DEBUG Unnecessary package: opt-einsum==3.4.0 2025-09-07T10:37:38.6474627Z #48 0.649 DEBUG Unnecessary package: outlines-core==0.2.10 2025-09-07T10:37:38.6475167Z #48 0.649 DEBUG Unnecessary package: partial-json-parser==0.2.1.1.post6 2025-09-07T10:37:38.6476067Z #48 0.649 DEBUG Unnecessary package: pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T10:37:38.6476865Z #48 0.649 DEBUG Preserving seed package: pip==25.2 2025-09-07T10:37:38.6477349Z #48 0.649 DEBUG Unnecessary package: prometheus-client==0.22.1 2025-09-07T10:37:38.6477939Z #48 0.649 DEBUG Unnecessary package: prometheus-fastapi-instrumentator==7.1.0 2025-09-07T10:37:38.6478518Z #48 0.649 DEBUG Unnecessary package: propcache==0.3.2 2025-09-07T10:37:38.6478963Z #48 0.649 DEBUG Unnecessary package: protobuf==6.32.0 2025-09-07T10:37:38.6479383Z #48 0.649 DEBUG Unnecessary package: psutil==7.0.0 2025-09-07T10:37:38.6479821Z #48 0.649 DEBUG Unnecessary package: py-cpuinfo==9.0.0 2025-09-07T10:37:38.6480251Z #48 0.649 DEBUG Unnecessary package: pybase64==1.4.2 2025-09-07T10:37:38.6480693Z #48 0.649 DEBUG Unnecessary package: pycountry==24.6.1 2025-09-07T10:37:38.6481125Z #48 0.649 DEBUG Unnecessary package: pycparser==2.22 2025-09-07T10:37:38.6481565Z #48 0.649 DEBUG Unnecessary package: pydantic==2.11.7 2025-09-07T10:37:38.6482014Z #48 0.649 DEBUG Unnecessary package: pydantic-core==2.33.2 2025-09-07T10:37:38.6482528Z #48 0.649 DEBUG Unnecessary package: pydantic-extra-types==2.10.5 2025-09-07T10:37:38.6483055Z #48 0.649 DEBUG Unnecessary package: pygments==2.19.2 2025-09-07T10:37:38.6483504Z #48 0.649 DEBUG Unnecessary package: pyproject-hooks==1.2.0 2025-09-07T10:37:38.6484043Z #48 0.649 DEBUG Unnecessary package: python-dotenv==1.1.1 2025-09-07T10:37:38.6484530Z #48 0.649 DEBUG Unnecessary package: python-json-logger==3.3.0 2025-09-07T10:37:38.6485048Z #48 0.649 DEBUG Unnecessary package: python-multipart==0.0.20 2025-09-07T10:37:38.6485504Z #48 0.649 DEBUG Unnecessary package: pyzmq==27.0.2 2025-09-07T10:37:38.6485928Z #48 0.649 DEBUG Unnecessary package: ray==2.49.1 2025-09-07T10:37:38.6486362Z #48 0.649 DEBUG Unnecessary package: referencing==0.36.2 2025-09-07T10:37:38.6486831Z #48 0.649 DEBUG Unnecessary package: regex==2025.9.1 2025-09-07T10:37:38.6487254Z #48 0.649 DEBUG Unnecessary package: rich==14.1.0 2025-09-07T10:37:38.6487681Z #48 0.649 DEBUG Unnecessary package: rich-toolkit==0.15.1 2025-09-07T10:37:38.6488134Z #48 0.649 DEBUG Unnecessary package: rignore==0.6.4 2025-09-07T10:37:38.6488562Z #48 0.649 DEBUG Unnecessary package: rpds-py==0.27.1 2025-09-07T10:37:38.6489015Z #48 0.649 DEBUG Unnecessary package: safetensors==0.6.2 2025-09-07T10:37:38.6489447Z #48 0.649 DEBUG Unnecessary package: scipy==1.16.1 2025-09-07T10:37:38.6489899Z #48 0.649 DEBUG Unnecessary package: sentencepiece==0.2.1 2025-09-07T10:37:38.6490366Z #48 0.649 DEBUG Unnecessary package: sentry-sdk==2.37.0 2025-09-07T10:37:38.6490812Z #48 0.649 DEBUG Unnecessary package: setproctitle==1.3.7 2025-09-07T10:37:38.6491391Z #48 0.649 DEBUG Unnecessary package: shellingham==1.5.4 2025-09-07T10:37:38.6491982Z #48 0.649 DEBUG Unnecessary package: six==1.17.0 2025-09-07T10:37:38.6492417Z #48 0.649 DEBUG Unnecessary package: sniffio==1.3.1 2025-09-07T10:37:38.6492862Z #48 0.649 DEBUG Unnecessary package: soundfile==0.13.1 2025-09-07T10:37:38.6493318Z #48 0.649 DEBUG Unnecessary package: soxr==0.5.0.post1 2025-09-07T10:37:38.6493769Z #48 0.649 DEBUG Unnecessary package: starlette==0.47.3 2025-09-07T10:37:38.6494211Z #48 0.649 DEBUG Unnecessary package: tiktoken==0.11.0 2025-09-07T10:37:38.6494664Z #48 0.649 DEBUG Unnecessary package: tokenizers==0.22.0 2025-09-07T10:37:38.6495662Z #48 0.649 DEBUG Unnecessary package: torchaudio==2.8.0.dev20250901+cu129 (from file:///dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:37:38.6497147Z #48 0.649 DEBUG Unnecessary package: torchvision==0.24.0.dev20250901+cu129 (from file:///dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T10:37:38.6498127Z #48 0.649 DEBUG Unnecessary package: tqdm==4.67.1 2025-09-07T10:37:38.6498815Z #48 0.649 DEBUG Unnecessary package: transformers==4.56.1 2025-09-07T10:37:38.6499283Z #48 0.649 DEBUG Unnecessary package: triton==3.4.0 2025-09-07T10:37:38.6499705Z #48 0.649 DEBUG Unnecessary package: typer==0.17.4 2025-09-07T10:37:38.6500184Z #48 0.649 DEBUG Unnecessary package: typing-inspection==0.4.1 2025-09-07T10:37:38.6500652Z #48 0.649 DEBUG Preserving seed package: uv==0.8.4 2025-09-07T10:37:38.6501098Z #48 0.649 DEBUG Unnecessary package: uvicorn==0.35.0 2025-09-07T10:37:38.6501526Z #48 0.649 DEBUG Unnecessary package: uvloop==0.21.0 2025-09-07T10:37:38.6502584Z #48 0.649 DEBUG Unnecessary package: vllm==0.10.2rc2.dev125+g4172235ab.d20250907.cu129 (from file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-linux_x86_64.whl) 2025-09-07T10:37:38.6503750Z #48 0.649 DEBUG Unnecessary package: watchfiles==1.1.0 2025-09-07T10:37:38.6504188Z #48 0.649 DEBUG Unnecessary package: websockets==15.0.1 2025-09-07T10:37:38.6504623Z #48 0.649 DEBUG Unnecessary package: wheel==0.45.1 2025-09-07T10:37:38.6505540Z #48 0.649 DEBUG Unnecessary package: xformers==0.0.33+5d4b92a5.d20250907 (from file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl) 2025-09-07T10:37:38.6506475Z #48 0.649 DEBUG Unnecessary package: xgrammar==0.1.23 2025-09-07T10:37:38.6506902Z #48 0.649 DEBUG Unnecessary package: yarl==1.20.1 2025-09-07T10:37:38.6508340Z #48 0.649 DEBUG No cache entry for: https://files.pythonhosted.org/packages/b7/b8/5f812452c653447b4c09fec3cf0c5192abab1ce18358fcfab16a70113cfa/nvidia_cudnn_frontend-1.14.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:37:38.6510384Z #48 0.649 DEBUG No cache entry for: https://files.pythonhosted.org/packages/f9/96/88a5cb161c61cab2ee65b5aa61e612901fbcb1660024f0ccb26fcb02a17c/nvidia_ml_py-13.580.65-py3-none-any.whl 2025-09-07T10:37:38.6512363Z #48 0.649 DEBUG No cache entry for: https://files.pythonhosted.org/packages/26/15/3dbe02186dc0daaa8410aa1c1c368d36967b88035ce1cea663e9ba11312a/cuda_bindings-12.9.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:37:38.6514372Z #48 0.649 DEBUG No cache entry for: https://files.pythonhosted.org/packages/22/54/6231878f6fc490f222c87190ce12196b67b7700b30818882a87f478e4944/cuda_pathfinder-1.2.1-py3-none-any.whl 2025-09-07T10:37:38.6516118Z #48 0.649 DEBUG No cache entry for: https://files.pythonhosted.org/packages/d7/4a/cac76c174bb439a0c46c9a4413fcbea5c6cabfb01879f7bbdb9fdfaed76c/pynvml-13.0.1-py3-none-any.whl 2025-09-07T10:37:38.6517846Z #48 0.649 DEBUG No cache entry for: https://files.pythonhosted.org/packages/24/3c/4475aebeaab9651f2e61000fbe76f91a476d371dbfbf0a1cf46e689af253/cuda_python-12.9.0-py3-none-any.whl 2025-09-07T10:37:38.6518953Z #48 0.652 Downloading nvidia-cudnn-frontend (1.7MiB) 2025-09-07T10:37:38.6519385Z #48 0.652 Downloading cuda-bindings (11.9MiB) 2025-09-07T10:37:38.6519774Z #48 0.776 Downloading nvidia-cudnn-frontend 2025-09-07T10:37:38.8334056Z #48 0.869 Downloading cuda-bindings 2025-09-07T10:37:38.8334630Z #48 0.869 Prepared 7 packages in 219ms 2025-09-07T10:37:39.0482824Z #48 1.234 Installed 7 packages in 364ms 2025-09-07T10:37:39.0483511Z #48 1.234 + cuda-bindings==12.9.2 2025-09-07T10:37:39.0483857Z #48 1.234 + cuda-pathfinder==1.2.1 2025-09-07T10:37:39.0484186Z #48 1.234 + cuda-python==12.9.0 2025-09-07T10:37:39.0485319Z #48 1.234 + flashinfer-python==0.2.14.post1 (from file:///workspace/wheels/flashinfer/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl) 2025-09-07T10:37:39.0486615Z #48 1.234 + nvidia-cudnn-frontend==1.14.1 2025-09-07T10:37:39.0486982Z #48 1.234 + nvidia-ml-py==13.580.65 2025-09-07T10:37:39.0487490Z #48 1.234 + pynvml==13.0.1 2025-09-07T10:37:39.1986191Z #48 1.234 DEBUG Released lock at `/tmp/uv-281d6a3886c08524.lock` 2025-09-07T10:37:48.6870478Z #48 DONE 10.9s 2025-09-07T10:37:48.8413405Z 2025-09-07T10:37:48.8414592Z #49 [vllm-base 17/18] RUN pip freeze | grep -E 'torch|xformers|vllm|flashinfer' 2025-09-07T10:37:49.5757279Z #49 0.885 flashinfer-python @ file:///workspace/wheels/flashinfer/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl 2025-09-07T10:37:49.5758553Z #49 0.885 pytorch-triton @ file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:37:49.5759542Z #49 0.885 torch @ file:///dist/torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T10:37:49.5760425Z #49 0.885 torchaudio @ file:///dist/torchaudio-2.8.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T10:37:49.5761382Z #49 0.885 torchvision @ file:///dist/torchvision-0.24.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T10:37:49.5762446Z #49 0.885 vllm @ file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-linux_x86_64.whl 2025-09-07T10:37:49.5763359Z #49 0.885 xformers @ file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T10:37:49.7429781Z #49 DONE 0.9s 2025-09-07T10:37:49.7430279Z 2025-09-07T10:37:49.7431603Z #50 [vllm-base 18/18] RUN uv pip freeze | grep -i '^torch\|^torchvision\|^torchaudio\|^xformers\|^vllm\|^flashinfer' > build_summary.txt 2025-09-07T10:37:50.3862821Z #50 0.794 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T10:37:50.6053162Z #50 DONE 0.8s 2025-09-07T10:37:50.6053363Z 2025-09-07T10:37:50.6053767Z #51 [export-wheels 3/4] COPY --from=vllm-base /workspace/build_summary.txt /wheels/build_summary.txt 2025-09-07T10:37:50.6054571Z #51 DONE 0.0s 2025-09-07T10:37:50.8800019Z 2025-09-07T10:37:50.8800922Z #52 [export-wheels 4/4] COPY --from=vllm-base /workspace/wheels/flashinfer /wheels/flashinfer-python 2025-09-07T10:37:51.0018591Z #52 DONE 0.0s 2025-09-07T10:37:51.0018833Z 2025-09-07T10:37:51.0018980Z #53 exporting to client directory 2025-09-07T10:37:51.0019336Z #53 copying files 54.64MB 0.1s 2025-09-07T10:37:54.5198119Z #53 copying files 900.39MB 3.5s done 2025-09-07T10:38:01.2536337Z #53 DONE 10.4s 2025-09-07T10:38:01.3495391Z 2025-09-07 10:38:01,348 [INFO] cli.lib.core.vllm.vllm_build: Generate GH Summary ... 2025-09-07T10:38:01.3975256Z ##[group]Run set -eux 2025-09-07T10:38:01.3975719Z set -eux 2025-09-07T10:38:01.3975981Z  2025-09-07T10:38:01.3976528Z # Get these wheels ready, the vllm renaming logic is copied from its .buildkite/scripts/upload-wheels.sh 2025-09-07T10:38:01.3977263Z docker exec -t "${container_name}" bash -c " 2025-09-07T10:38:01.3977662Z  set -eux 2025-09-07T10:38:01.3977927Z  2025-09-07T10:38:01.3978393Z  nightly=\$(unzip -p torch-* '**/METADATA' | grep '^Version: ' | cut -d' ' -f2 | cut -d'.' -f4) 2025-09-07T10:38:01.3978960Z  2025-09-07T10:38:01.3979227Z  pushd externals/vllm/wheels 2025-09-07T10:38:01.3979666Z  for package in xformers flashinfer-python vllm; do 2025-09-07T10:38:01.3980112Z  pushd \$package 2025-09-07T10:38:01.3980480Z  auditwheel repair --plat \$PLATFORM *.whl \ 2025-09-07T10:38:01.3981107Z  --exclude libc10* --exclude libtorch* --exclude libcu* --exclude libnv* 2025-09-07T10:38:01.3981723Z  repair_wheel=\$(find wheelhouse -name *\${PLATFORM}*) 2025-09-07T10:38:01.3982225Z  repair_wheel=\$(basename \${repair_wheel}) 2025-09-07T10:38:01.3982606Z  popd 2025-09-07T10:38:01.3982860Z  2025-09-07T10:38:01.3983268Z  cp \${package}/wheelhouse/\${repair_wheel} . 2025-09-07T10:38:01.3983957Z  version=\$(unzip -p \$repair_wheel '**/METADATA' | grep '^Version: ' | cut -d' ' -f2) 2025-09-07T10:38:01.3984470Z  2025-09-07T10:38:01.3984712Z  if [[ \$package == vllm ]]; then 2025-09-07T10:38:01.3985143Z  new_wheel=\${repair_wheel/\$version/1.0.0.\$nightly} 2025-09-07T10:38:01.3985532Z  else 2025-09-07T10:38:01.3985902Z  major_version=\$(echo \$version | tr '.+' '.' | cut -d'.' -f1-3) 2025-09-07T10:38:01.3986461Z  new_wheel=\${repair_wheel/\$version/\$major_version.\$nightly} 2025-09-07T10:38:01.3986896Z  fi 2025-09-07T10:38:01.3987134Z  2025-09-07T10:38:01.3987378Z  mv -- \$repair_wheel \$new_wheel 2025-09-07T10:38:01.3987731Z  rm -rf \$package 2025-09-07T10:38:01.3988006Z  done 2025-09-07T10:38:01.3988243Z  popd 2025-09-07T10:38:01.3988464Z " 2025-09-07T10:38:01.3988686Z  2025-09-07T10:38:01.3989025Z docker exec -t "${container_name}" chown -R 1000:1000 /artifacts 2025-09-07T10:38:01.3999194Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:38:01.3999579Z env: 2025-09-07T10:38:01.3999799Z PY_VERS: 3.12 2025-09-07T10:38:01.4000106Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T10:38:01.4000503Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T10:38:01.4000803Z BUILD_DEVICE: cu129 2025-09-07T10:38:01.4001117Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T10:38:01.4001692Z container_name: 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 2025-09-07T10:38:01.4002187Z ##[endgroup] 2025-09-07T10:38:01.4031872Z + docker exec -t 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 bash -c ' 2025-09-07T10:38:01.4032480Z set -eux 2025-09-07T10:38:01.4032617Z 2025-09-07T10:38:01.4032992Z nightly=$(unzip -p torch-* '\''**/METADATA'\'' | grep '\''^Version: '\'' | cut -d'\'' '\'' -f2 | cut -d'\''.'\'' -f4) 2025-09-07T10:38:01.4033652Z 2025-09-07T10:38:01.4033772Z pushd externals/vllm/wheels 2025-09-07T10:38:01.4034160Z for package in xformers flashinfer-python vllm; do 2025-09-07T10:38:01.4034562Z pushd $package 2025-09-07T10:38:01.4035143Z auditwheel repair --plat $PLATFORM *.whl --exclude libc10* --exclude libtorch* --exclude libcu* --exclude libnv* 2025-09-07T10:38:01.4035870Z repair_wheel=$(find wheelhouse -name *${PLATFORM}*) 2025-09-07T10:38:01.4036286Z repair_wheel=$(basename ${repair_wheel}) 2025-09-07T10:38:01.4036633Z popd 2025-09-07T10:38:01.4036877Z 2025-09-07T10:38:01.4037019Z cp ${package}/wheelhouse/${repair_wheel} . 2025-09-07T10:38:01.4037658Z version=$(unzip -p $repair_wheel '\''**/METADATA'\'' | grep '\''^Version: '\'' | cut -d'\'' '\'' -f2) 2025-09-07T10:38:01.4038108Z 2025-09-07T10:38:01.4038238Z if [[ $package == vllm ]]; then 2025-09-07T10:38:01.4038628Z new_wheel=${repair_wheel/$version/1.0.0.$nightly} 2025-09-07T10:38:01.4039000Z else 2025-09-07T10:38:01.4039365Z major_version=$(echo $version | tr '\''.+'\'' '\''.'\'' | cut -d'\''.'\'' -f1-3) 2025-09-07T10:38:01.4039941Z new_wheel=${repair_wheel/$version/$major_version.$nightly} 2025-09-07T10:38:01.4040340Z fi 2025-09-07T10:38:01.4040485Z 2025-09-07T10:38:01.4040603Z mv -- $repair_wheel $new_wheel 2025-09-07T10:38:01.4040924Z rm -rf $package 2025-09-07T10:38:01.4041160Z done 2025-09-07T10:38:01.4041376Z popd 2025-09-07T10:38:01.4041577Z ' 2025-09-07T10:38:01.5894016Z ++ unzip -p torch-2.9.0.dev20250901+cu129-cp312-cp312-manylinux_2_28_x86_64.whl '**/METADATA' 2025-09-07T10:38:01.5894639Z ++ grep '^Version: ' 2025-09-07T10:38:01.5894921Z ++ cut '-d ' -f2 2025-09-07T10:38:01.5895174Z ++ cut -d. -f4 2025-09-07T10:38:01.9951458Z + nightly=dev20250901+cu129 2025-09-07T10:38:01.9951833Z + pushd externals/vllm/wheels 2025-09-07T10:38:01.9952209Z /artifacts/externals/vllm/wheels /artifacts 2025-09-07T10:38:01.9952651Z + for package in xformers flashinfer-python vllm 2025-09-07T10:38:01.9953050Z + pushd xformers 2025-09-07T10:38:01.9953568Z /artifacts/externals/vllm/wheels/xformers /artifacts/externals/vllm/wheels /artifacts 2025-09-07T10:38:01.9954839Z + auditwheel repair --plat manylinux_2_28_x86_64 xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl --exclude 'libc10*' --exclude 'libtorch*' --exclude 'libcu*' --exclude 'libnv*' 2025-09-07T10:38:02.2804501Z INFO:auditwheel.main_repair:Repairing xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T10:38:07.5099005Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:38:07.5099552Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:38:07.5100031Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:38:07.5100488Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:38:07.5100943Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:38:07.5101388Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:38:07.7814751Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:38:07.7815234Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:38:07.7815676Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:38:07.7816141Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:38:07.7816597Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:38:07.7817039Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:38:08.0578872Z INFO:auditwheel.main_repair:Wheel is eligible for a higher priority tag. You requested manylinux_2_28_x86_64 but I have found this wheel is eligible for manylinux_2_27_x86_64. 2025-09-07T10:38:13.2894077Z INFO:auditwheel.wheeltools:Previous filename tags: linux_x86_64 2025-09-07T10:38:13.2894831Z INFO:auditwheel.wheeltools:New filename tags: manylinux_2_27_x86_64, manylinux_2_28_x86_64 2025-09-07T10:38:13.2895609Z INFO:auditwheel.wheeltools:Previous WHEEL info tags: cp39-abi3-linux_x86_64 2025-09-07T10:38:13.2896694Z INFO:auditwheel.wheeltools:New WHEEL info tags: cp39-abi3-manylinux_2_27_x86_64, cp39-abi3-manylinux_2_28_x86_64 2025-09-07T10:39:08.7783638Z INFO:auditwheel.main_repair: 2025-09-07T10:39:08.7784781Z Fixed-up wheel written to /artifacts/externals/vllm/wheels/xformers/wheelhouse/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:08.8018169Z ++ find wheelhouse -name '*manylinux_2_28_x86_64*' 2025-09-07T10:39:08.8051399Z + repair_wheel=wheelhouse/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:08.8052888Z ++ basename wheelhouse/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:08.8083374Z + repair_wheel=xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:08.8084037Z + popd 2025-09-07T10:39:08.8084299Z /artifacts/externals/vllm/wheels /artifacts 2025-09-07T10:39:08.8085060Z + cp xformers/wheelhouse/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl . 2025-09-07T10:39:08.9575328Z ++ unzip -p xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl '**/METADATA' 2025-09-07T10:39:08.9576101Z ++ grep '^Version: ' 2025-09-07T10:39:08.9576386Z ++ cut '-d ' -f2 2025-09-07T10:39:08.9646025Z + version=0.0.33+5d4b92a5.d20250907 2025-09-07T10:39:08.9646421Z + [[ xformers == vllm ]] 2025-09-07T10:39:08.9649977Z ++ echo 0.0.33+5d4b92a5.d20250907 2025-09-07T10:39:08.9650825Z ++ tr .+ . 2025-09-07T10:39:08.9651597Z ++ cut -d. -f1-3 2025-09-07T10:39:08.9682245Z + major_version=0.0.33 2025-09-07T10:39:08.9682949Z + new_wheel=xformers-0.0.33.dev20250901+cu129-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:08.9684276Z + mv -- xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl xformers-0.0.33.dev20250901+cu129-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:08.9710717Z + rm -rf xformers 2025-09-07T10:39:09.0191120Z + for package in xformers flashinfer-python vllm 2025-09-07T10:39:09.0191601Z + pushd flashinfer-python 2025-09-07T10:39:09.0192219Z /artifacts/externals/vllm/wheels/flashinfer-python /artifacts/externals/vllm/wheels /artifacts 2025-09-07T10:39:09.1471457Z + auditwheel repair --plat manylinux_2_28_x86_64 flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl --exclude 'libc10*' --exclude 'libtorch*' --exclude 'libcu*' --exclude 'libnv*' 2025-09-07T10:39:09.1472775Z INFO:auditwheel.main_repair:Repairing flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl 2025-09-07T10:39:12.0046502Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.0047014Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.0047465Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.0047906Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.0048361Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.0049005Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.0778591Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.0779127Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.0779603Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.0780075Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.0780519Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.0780965Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.1450975Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.1451451Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.1451921Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.1452393Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.1452835Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.1453277Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.2112472Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.2112951Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.2113391Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.2113844Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.2114286Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.2114705Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.2778695Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.2779159Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.2779850Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.2780379Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.2780839Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.2781270Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.3428329Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.3428911Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.3429354Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.3429808Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.3430236Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.3430669Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.4094703Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.4095165Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.4095627Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.4096100Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.4096558Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.4096988Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.4773525Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.4774001Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.4774461Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.4774929Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.4775375Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.4775829Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.5426366Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.5426881Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.5427328Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.5427781Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.5428232Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.5428665Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.6140696Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.6141156Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.6141626Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.6142089Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.6142545Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.6142976Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.6820176Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.6820639Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.6821105Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.6821574Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.6822018Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.6822464Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.7582177Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.7582685Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.7583257Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.7583717Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.7584150Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.7584749Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.8283414Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.8283987Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.8284442Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.8284886Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.8285325Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.8285742Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.8972636Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.8973346Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.8973855Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.8974321Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.8974762Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.8975206Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:12.9646024Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:12.9646590Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:12.9647050Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:12.9647495Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:12.9647937Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:12.9648374Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.0389414Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.0389955Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.0390432Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.0390893Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.0391324Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.0391764Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.1116742Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.1117250Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.1117701Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.1118158Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.1118602Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.1119024Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.1851631Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.1852095Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.1852563Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.1853031Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.1853501Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.1853953Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.2577199Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.2577674Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.2578135Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.2578607Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.2579052Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.2579500Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.3274540Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.3275094Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.3275545Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.3275989Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.3276434Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.3276881Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.3991535Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.3992098Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.3992540Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.3992998Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.3993587Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.3994025Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.4777366Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.4777833Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.4778300Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.4778755Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.4779206Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.4779639Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.5491897Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.5492449Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.5492903Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.5493375Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.5493819Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.5494265Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.6150302Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.6150815Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.6151278Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.6151732Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.6152192Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.6152624Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.6896014Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.6896491Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.6897038Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.6897495Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.6897949Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.6898384Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.7617477Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.7617987Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.7618485Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.7618951Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.7619389Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.7619833Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.8354481Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.8354981Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.8355489Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.8355971Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.8356428Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.8356859Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.9060379Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.9060860Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.9061397Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.9061867Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.9062311Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.9062861Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:13.9741887Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:13.9742364Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:13.9742814Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:13.9743282Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:13.9743839Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:13.9744275Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.0534100Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.0534573Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.0535023Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.0535658Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.0536102Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.0536553Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.1315251Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.1315758Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.1316215Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.1316657Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.1317107Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.1317759Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.2086184Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.2086624Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.2087074Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.2087526Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.2087962Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.2088393Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.2792367Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.2793193Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.2793717Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.2794167Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.2794618Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.2795046Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.3473049Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.3473869Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.3474385Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.3474846Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.3475281Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.3475731Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.4173127Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.4173606Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.4174055Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.4174525Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.4174967Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.4175412Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.4836904Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.4837468Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.4837930Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.4838374Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.4838819Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.4839247Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.5484048Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.5484580Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.5485023Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.5485473Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.5485909Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.5486325Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.6332748Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.6333222Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.6333685Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.6334155Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.6334618Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.6335050Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.6974126Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.6974583Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.6975194Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.6975652Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.6976104Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.6976547Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.7612680Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.7613156Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.7613603Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.7614065Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.7614695Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.7615766Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.8302816Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.8303294Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.8303867Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.8304322Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.8304759Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.8305178Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.8952281Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.8952780Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.8953230Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.8953698Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.8954140Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.8954598Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:14.9771329Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:14.9771993Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:14.9772463Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:14.9772956Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:14.9773419Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:14.9773851Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.0533588Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.0534069Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.0534520Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.0534983Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.0535422Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.0535862Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.1180700Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.1181187Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.1181648Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.1182099Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.1182553Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.1182988Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.1875709Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.1876250Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.1876710Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.1877164Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.1877593Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.1878033Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.2550820Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.2551279Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.2551759Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.2552222Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.2552680Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.2553116Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.3333265Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.3333752Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.3334205Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.3334674Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.3335120Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.3335566Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.4013923Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.4014409Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.4014877Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.4015529Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.4016047Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.4016480Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.4651236Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.4651704Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.4652185Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.4652641Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.4653094Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.4653538Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.5338097Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.5338585Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.5339036Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.5339504Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.5339963Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.5340408Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.6012598Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.6013087Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.6013546Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.6014007Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.6014462Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.6014891Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.6724466Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.6725016Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.6725461Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.6725966Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.6726397Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.6726842Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.7436552Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.7437102Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.7437564Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.7438003Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.7438454Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.7438870Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.8095609Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.8096080Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.8096546Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.8097021Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.8097462Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.8097906Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.8798275Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.8799175Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.8799678Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.8800142Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.8800586Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.8801163Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:15.9475252Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:15.9475790Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:15.9476244Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:15.9476695Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:15.9477120Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:15.9477552Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:16.0119296Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:16.0119846Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:16.0120480Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:16.0120993Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:16.0121440Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:16.0121857Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:16.0836200Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:16.0837159Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:16.0837702Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:16.0838163Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:16.0838590Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:16.0839020Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:16.1480960Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:16.1481509Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:16.1481951Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:16.1482416Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:16.1482849Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:16.1483281Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:16.4611087Z INFO:auditwheel.main_repair:Wheel is eligible for a higher priority tag. You requested manylinux_2_28_x86_64 but I have found this wheel is eligible for manylinux_2_27_x86_64. 2025-09-07T10:39:19.3355165Z INFO:auditwheel.wheeltools:Previous filename tags: linux_x86_64 2025-09-07T10:39:19.3355932Z INFO:auditwheel.wheeltools:New filename tags: manylinux_2_27_x86_64, manylinux_2_28_x86_64 2025-09-07T10:39:19.3356715Z INFO:auditwheel.wheeltools:Previous WHEEL info tags: cp39-abi3-linux_x86_64 2025-09-07T10:39:19.3357575Z INFO:auditwheel.wheeltools:New WHEEL info tags: cp39-abi3-manylinux_2_27_x86_64, cp39-abi3-manylinux_2_28_x86_64 2025-09-07T10:39:49.2933606Z INFO:auditwheel.main_repair: 2025-09-07T10:39:49.2934776Z Fixed-up wheel written to /artifacts/externals/vllm/wheels/flashinfer-python/wheelhouse/flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:49.3251462Z ++ find wheelhouse -name '*manylinux_2_28_x86_64*' 2025-09-07T10:39:49.3310098Z + repair_wheel=wheelhouse/flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:49.3311200Z ++ basename wheelhouse/flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:49.3340790Z + repair_wheel=flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:49.3341452Z + popd 2025-09-07T10:39:49.3341734Z /artifacts/externals/vllm/wheels /artifacts 2025-09-07T10:39:49.3342509Z + cp flashinfer-python/wheelhouse/flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl . 2025-09-07T10:39:49.4083836Z ++ unzip -p flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl '**/METADATA' 2025-09-07T10:39:49.4084613Z ++ grep '^Version: ' 2025-09-07T10:39:49.4084898Z ++ cut '-d ' -f2 2025-09-07T10:39:49.5981736Z + version=0.2.14.post1 2025-09-07T10:39:49.5982080Z + [[ flashinfer-python == vllm ]] 2025-09-07T10:39:49.5988045Z ++ echo 0.2.14.post1 2025-09-07T10:39:49.5988512Z ++ tr .+ . 2025-09-07T10:39:49.5989689Z ++ cut -d. -f1-3 2025-09-07T10:39:49.6016191Z + major_version=0.2.14 2025-09-07T10:39:49.6016863Z + new_wheel=flashinfer_python-0.2.14.dev20250901+cu129-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:49.6018266Z + mv -- flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl flashinfer_python-0.2.14.dev20250901+cu129-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:39:49.6049245Z + rm -rf flashinfer-python 2025-09-07T10:39:49.6302609Z + for package in xformers flashinfer-python vllm 2025-09-07T10:39:49.6303040Z + pushd vllm 2025-09-07T10:39:49.6303851Z /artifacts/externals/vllm/wheels/vllm /artifacts/externals/vllm/wheels /artifacts 2025-09-07T10:39:49.6305216Z + auditwheel repair --plat manylinux_2_28_x86_64 vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-linux_x86_64.whl --exclude 'libc10*' --exclude 'libtorch*' --exclude 'libcu*' --exclude 'libnv*' 2025-09-07T10:39:49.7569191Z INFO:auditwheel.main_repair:Repairing vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-linux_x86_64.whl 2025-09-07T10:39:56.9036576Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:56.9037607Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:56.9038114Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T10:39:56.9038566Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:56.9039018Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:56.9039509Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:56.9039930Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:57.1855313Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:57.1856272Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:57.1856780Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T10:39:57.1857232Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:57.1857684Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:57.1858140Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:57.1858578Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:57.2551433Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:57.2552248Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:57.2552698Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T10:39:57.2553149Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:57.2553594Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:57.2554046Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:57.2554480Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:57.3573124Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T10:39:57.3573911Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:57.3574333Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:57.3574769Z INFO:auditwheel.lddtree:Excluding libnvrtc.so.12 2025-09-07T10:39:57.3575206Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:57.3575660Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:57.3576125Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:57.4307443Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:57.4308252Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:57.4308690Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T10:39:57.4309135Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:57.4309572Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:57.4310010Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:57.4310431Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:57.5593401Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T10:39:57.5594278Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T10:39:57.5594716Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T10:39:57.5595152Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T10:39:57.5595769Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T10:39:57.5596198Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T10:39:57.5596618Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T10:39:58.0548418Z INFO:auditwheel.main_repair:Wheel is eligible for a higher priority tag. You requested manylinux_2_28_x86_64 but I have found this wheel is eligible for manylinux_2_24_x86_64. 2025-09-07T10:40:05.2193435Z INFO:auditwheel.wheeltools:Previous filename tags: linux_x86_64 2025-09-07T10:40:05.2194352Z INFO:auditwheel.wheeltools:New filename tags: manylinux_2_24_x86_64, manylinux_2_28_x86_64 2025-09-07T10:40:05.2195550Z INFO:auditwheel.wheeltools:Previous WHEEL info tags: cp38-abi3-linux_x86_64 2025-09-07T10:40:05.2197013Z INFO:auditwheel.wheeltools:New WHEEL info tags: cp38-abi3-manylinux_2_24_x86_64, cp38-abi3-manylinux_2_28_x86_64 2025-09-07T10:41:21.0568079Z INFO:auditwheel.main_repair: 2025-09-07T10:41:21.0569370Z Fixed-up wheel written to /artifacts/externals/vllm/wheels/vllm/wheelhouse/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:41:21.0812036Z ++ find wheelhouse -name '*manylinux_2_28_x86_64*' 2025-09-07T10:41:21.0844438Z + repair_wheel=wheelhouse/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:41:21.0845648Z ++ basename wheelhouse/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:41:21.0877926Z + repair_wheel=vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:41:21.0878672Z + popd 2025-09-07T10:41:21.0878963Z /artifacts/externals/vllm/wheels /artifacts 2025-09-07T10:41:21.0879755Z + cp vllm/wheelhouse/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl . 2025-09-07T10:41:21.2935887Z ++ unzip -p vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl '**/METADATA' 2025-09-07T10:41:21.2936757Z ++ grep '^Version: ' 2025-09-07T10:41:21.2937034Z ++ cut '-d ' -f2 2025-09-07T10:41:21.3906670Z + version=0.10.2rc2.dev125+g4172235ab.d20250907.cu129 2025-09-07T10:41:21.3907354Z + [[ vllm == vllm ]] 2025-09-07T10:41:21.3907872Z + new_wheel=vllm-1.0.0.dev20250901+cu129-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:41:21.3909165Z + mv -- vllm-0.10.2rc2.dev125+g4172235ab.d20250907.cu129-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl vllm-1.0.0.dev20250901+cu129-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T10:41:21.3932876Z + rm -rf vllm 2025-09-07T10:41:21.4538722Z + popd 2025-09-07T10:41:21.4539214Z /artifacts 2025-09-07T10:41:21.4567464Z + docker exec -t 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 chown -R 1000:1000 /artifacts 2025-09-07T10:41:21.5871128Z ##[group]Run actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874 2025-09-07T10:41:21.5871673Z with: 2025-09-07T10:41:21.5871959Z name: vllm-wheel-cu129-3.12-manylinux_2_28_x86_64 2025-09-07T10:41:21.5872364Z if-no-files-found: error 2025-09-07T10:41:21.5872872Z path: /home/ec2-user/actions-runner/_work/_temp/artifacts/externals/vllm/wheels/*.whl 2025-09-07T10:41:21.5873437Z compression-level: 6 2025-09-07T10:41:21.5873703Z overwrite: false 2025-09-07T10:41:21.5873983Z include-hidden-files: false 2025-09-07T10:41:21.5874271Z env: 2025-09-07T10:41:21.5874496Z PY_VERS: 3.12 2025-09-07T10:41:21.5874838Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T10:41:21.5888057Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T10:41:21.5888474Z BUILD_DEVICE: cu129 2025-09-07T10:41:21.5888853Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T10:41:21.5889516Z container_name: 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 2025-09-07T10:41:21.5890050Z ##[endgroup] 2025-09-07T10:41:21.8547637Z With the provided path, there will be 3 files uploaded 2025-09-07T10:41:21.8552007Z Artifact name is valid! 2025-09-07T10:41:21.8553764Z Root directory input is valid! 2025-09-07T10:41:22.1095429Z Beginning upload of artifact content to blob storage 2025-09-07T10:41:22.8488088Z Uploaded bytes 8388608 2025-09-07T10:41:23.1707705Z Uploaded bytes 16777216 2025-09-07T10:41:23.5138147Z Uploaded bytes 25165824 2025-09-07T10:41:23.8757932Z Uploaded bytes 33554432 2025-09-07T10:41:24.2278759Z Uploaded bytes 41943040 2025-09-07T10:41:24.6137079Z Uploaded bytes 50331648 2025-09-07T10:41:25.0288191Z Uploaded bytes 58720256 2025-09-07T10:41:25.3579775Z Uploaded bytes 67108864 2025-09-07T10:41:25.7286612Z Uploaded bytes 75497472 2025-09-07T10:41:26.1293785Z Uploaded bytes 83886080 2025-09-07T10:41:26.5002422Z Uploaded bytes 92274688 2025-09-07T10:41:26.6947540Z Uploaded bytes 100663296 2025-09-07T10:41:26.9948565Z Uploaded bytes 109051904 2025-09-07T10:41:27.3638381Z Uploaded bytes 117440512 2025-09-07T10:41:27.7130524Z Uploaded bytes 125829120 2025-09-07T10:41:28.0850550Z Uploaded bytes 134217728 2025-09-07T10:41:28.4760042Z Uploaded bytes 142606336 2025-09-07T10:41:28.8296518Z Uploaded bytes 150994944 2025-09-07T10:41:29.2096856Z Uploaded bytes 159383552 2025-09-07T10:41:29.5156971Z Uploaded bytes 167772160 2025-09-07T10:41:29.8227575Z Uploaded bytes 176160768 2025-09-07T10:41:30.1418765Z Uploaded bytes 184549376 2025-09-07T10:41:30.4621299Z Uploaded bytes 192937984 2025-09-07T10:41:30.7689382Z Uploaded bytes 201326592 2025-09-07T10:41:31.1072869Z Uploaded bytes 209715200 2025-09-07T10:41:31.4694593Z Uploaded bytes 218103808 2025-09-07T10:41:31.8228992Z Uploaded bytes 226492416 2025-09-07T10:41:32.1639737Z Uploaded bytes 234881024 2025-09-07T10:41:32.6017482Z Uploaded bytes 243269632 2025-09-07T10:41:32.8686617Z Uploaded bytes 251658240 2025-09-07T10:41:33.1935061Z Uploaded bytes 260046848 2025-09-07T10:41:33.5261938Z Uploaded bytes 268435456 2025-09-07T10:41:33.8654238Z Uploaded bytes 276824064 2025-09-07T10:41:34.2229285Z Uploaded bytes 285212672 2025-09-07T10:41:34.5610238Z Uploaded bytes 293601280 2025-09-07T10:41:34.9115959Z Uploaded bytes 301989888 2025-09-07T10:41:35.2508108Z Uploaded bytes 310378496 2025-09-07T10:41:35.6100630Z Uploaded bytes 318767104 2025-09-07T10:41:35.9724486Z Uploaded bytes 327155712 2025-09-07T10:41:36.3247706Z Uploaded bytes 335544320 2025-09-07T10:41:36.6732266Z Uploaded bytes 343932928 2025-09-07T10:41:36.9799350Z Uploaded bytes 352321536 2025-09-07T10:41:37.3135872Z Uploaded bytes 360710144 2025-09-07T10:41:37.6364536Z Uploaded bytes 369098752 2025-09-07T10:41:37.9537697Z Uploaded bytes 377487360 2025-09-07T10:41:38.3909950Z Uploaded bytes 385875968 2025-09-07T10:41:38.6724142Z Uploaded bytes 394264576 2025-09-07T10:41:39.0741140Z Uploaded bytes 402653184 2025-09-07T10:41:39.3273082Z Uploaded bytes 411041792 2025-09-07T10:41:39.6855647Z Uploaded bytes 419430400 2025-09-07T10:41:40.1117427Z Uploaded bytes 427819008 2025-09-07T10:41:40.6470780Z Uploaded bytes 436207616 2025-09-07T10:41:40.9074500Z Uploaded bytes 444596224 2025-09-07T10:41:41.2653736Z Uploaded bytes 452984832 2025-09-07T10:41:41.7052568Z Uploaded bytes 461373440 2025-09-07T10:41:41.9249910Z Uploaded bytes 469762048 2025-09-07T10:41:42.3401190Z Uploaded bytes 478150656 2025-09-07T10:41:42.8640228Z Uploaded bytes 486539264 2025-09-07T10:41:43.1823141Z Uploaded bytes 494927872 2025-09-07T10:41:43.5880805Z Uploaded bytes 503316480 2025-09-07T10:41:43.7998174Z Uploaded bytes 511705088 2025-09-07T10:41:44.2135643Z Uploaded bytes 520093696 2025-09-07T10:41:44.4916572Z Uploaded bytes 528482304 2025-09-07T10:41:44.8372903Z Uploaded bytes 536870912 2025-09-07T10:41:45.1666927Z Uploaded bytes 545259520 2025-09-07T10:41:45.5687525Z Uploaded bytes 553648128 2025-09-07T10:41:45.9957347Z Uploaded bytes 562036736 2025-09-07T10:41:46.3325336Z Uploaded bytes 570425344 2025-09-07T10:41:46.6846299Z Uploaded bytes 578813952 2025-09-07T10:41:46.9819990Z Uploaded bytes 587202560 2025-09-07T10:41:47.3006129Z Uploaded bytes 595591168 2025-09-07T10:41:47.6249898Z Uploaded bytes 603979776 2025-09-07T10:41:47.9732553Z Uploaded bytes 612368384 2025-09-07T10:41:48.3102413Z Uploaded bytes 620756992 2025-09-07T10:41:48.6160270Z Uploaded bytes 629145600 2025-09-07T10:41:49.0082255Z Uploaded bytes 637534208 2025-09-07T10:41:49.3280260Z Uploaded bytes 645922816 2025-09-07T10:41:49.7586041Z Uploaded bytes 654311424 2025-09-07T10:41:50.0478865Z Uploaded bytes 662700032 2025-09-07T10:41:50.4018493Z Uploaded bytes 671088640 2025-09-07T10:41:50.8581715Z Uploaded bytes 679477248 2025-09-07T10:41:51.1609584Z Uploaded bytes 687865856 2025-09-07T10:41:51.4782111Z Uploaded bytes 696254464 2025-09-07T10:41:51.8217015Z Uploaded bytes 704643072 2025-09-07T10:41:52.1242252Z Uploaded bytes 713031680 2025-09-07T10:41:52.4603557Z Uploaded bytes 721420288 2025-09-07T10:41:52.7689675Z Uploaded bytes 729808896 2025-09-07T10:41:53.1128259Z Uploaded bytes 738197504 2025-09-07T10:41:53.4260925Z Uploaded bytes 746586112 2025-09-07T10:41:53.7433907Z Uploaded bytes 754974720 2025-09-07T10:41:54.1331553Z Uploaded bytes 763363328 2025-09-07T10:41:54.4329120Z Uploaded bytes 771751936 2025-09-07T10:41:54.7735487Z Uploaded bytes 780140544 2025-09-07T10:41:55.1137900Z Uploaded bytes 788529152 2025-09-07T10:41:55.5558517Z Uploaded bytes 796917760 2025-09-07T10:41:55.9511001Z Uploaded bytes 805306368 2025-09-07T10:41:56.2351520Z Uploaded bytes 813694976 2025-09-07T10:41:56.5571464Z Uploaded bytes 822083584 2025-09-07T10:41:57.0109773Z Uploaded bytes 830472192 2025-09-07T10:41:57.3283156Z Uploaded bytes 838860800 2025-09-07T10:41:57.6442880Z Uploaded bytes 847249408 2025-09-07T10:41:57.8854898Z Uploaded bytes 855638016 2025-09-07T10:41:58.2106265Z Uploaded bytes 864026624 2025-09-07T10:41:58.5536912Z Uploaded bytes 872415232 2025-09-07T10:41:58.8213510Z Uploaded bytes 880803840 2025-09-07T10:41:59.1646030Z Uploaded bytes 888121166 2025-09-07T10:41:59.1876363Z Finished uploading artifact content to blob storage! 2025-09-07T10:41:59.1878517Z SHA256 hash of uploaded artifact zip is afab2b02d2926eadf527c4d86fc0dc3a2f48a9d9b90ae595f9c8b7a2d65d8c57 2025-09-07T10:41:59.1880649Z Finalizing artifact upload 2025-09-07T10:41:59.2925794Z Artifact vllm-wheel-cu129-3.12-manylinux_2_28_x86_64.zip successfully finalized. Artifact ID 3946963085 2025-09-07T10:41:59.2926878Z Artifact vllm-wheel-cu129-3.12-manylinux_2_28_x86_64 has been successfully uploaded! Final size is 888121166 bytes. Artifact ID is 3946963085 2025-09-07T10:41:59.2933853Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/17524754495/artifacts/3946963085 2025-09-07T10:41:59.3132289Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2025-09-07T10:41:59.3132839Z with: 2025-09-07T10:41:59.3133063Z env: 2025-09-07T10:41:59.3133309Z PY_VERS: 3.12 2025-09-07T10:41:59.3133679Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T10:41:59.3134134Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T10:41:59.3134479Z BUILD_DEVICE: cu129 2025-09-07T10:41:59.3134833Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T10:41:59.3135498Z container_name: 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 2025-09-07T10:41:59.3136064Z ##[endgroup] 2025-09-07T10:41:59.3166819Z ##[group]Run set -eou pipefail 2025-09-07T10:41:59.3167190Z set -eou pipefail 2025-09-07T10:41:59.3167474Z  2025-09-07T10:41:59.3168002Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2025-09-07T10:41:59.3168486Z for _ in $(seq 1440); do 2025-09-07T10:41:59.3168876Z  # Break if no ssh session exists anymore 2025-09-07T10:41:59.3169233Z  if [ "$(who)" = "" ]; then 2025-09-07T10:41:59.3169557Z  break 2025-09-07T10:41:59.3169795Z  fi 2025-09-07T10:41:59.3170043Z  echo "." 2025-09-07T10:41:59.3170300Z  sleep 5 2025-09-07T10:41:59.3170535Z done 2025-09-07T10:41:59.3181500Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:41:59.3181954Z env: 2025-09-07T10:41:59.3182324Z PY_VERS: 3.12 2025-09-07T10:41:59.3182691Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T10:41:59.3183128Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T10:41:59.3183473Z BUILD_DEVICE: cu129 2025-09-07T10:41:59.3183841Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T10:41:59.3184533Z container_name: 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 2025-09-07T10:41:59.3185044Z ##[endgroup] 2025-09-07T10:41:59.3221712Z Holding runner for 2 hours until all ssh sessions have logged out 2025-09-07T10:41:59.3326552Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T10:41:59.3327184Z # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T10:41:59.3327742Z # shellcheck disable=SC2046 2025-09-07T10:41:59.3328116Z docker stop $(docker ps -q) || true 2025-09-07T10:41:59.3328502Z # Prune all of the docker images 2025-09-07T10:41:59.3328855Z docker system prune -af 2025-09-07T10:41:59.3335705Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:41:59.3336137Z env: 2025-09-07T10:41:59.3336385Z PY_VERS: 3.12 2025-09-07T10:41:59.3336733Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T10:41:59.3337182Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T10:41:59.3337507Z BUILD_DEVICE: cu129 2025-09-07T10:41:59.3337874Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T10:41:59.3338507Z container_name: 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 2025-09-07T10:41:59.3339085Z ##[endgroup] 2025-09-07T10:42:00.1403764Z 80ef158bd78d 2025-09-07T10:42:01.4225589Z Deleted Containers: 2025-09-07T10:42:01.4226137Z 80ef158bd78daf0fe4c8ae3b68c62fe43c0adc6495b719743f4f71163edb8e48 2025-09-07T10:42:01.4226541Z 2025-09-07T10:42:01.4952002Z Deleted Images: 2025-09-07T10:42:01.4952409Z untagged: pytorch/manylinux2_28-builder:cuda12.9 2025-09-07T10:42:01.4953217Z untagged: pytorch/manylinux2_28-builder@sha256:af68b954a0a5df04f9f2d7d0181ee3340dec6e378acf0db77a7d3b61d2ecc3aa 2025-09-07T10:42:01.4954189Z deleted: sha256:b3ae6fca04b3f4d5e8fb61129cdeeb9b44e005c8465feb9ecc84bd02eb42ecca 2025-09-07T10:42:01.4954641Z 2025-09-07T10:42:13.0042033Z Deleted build cache objects: 2025-09-07T10:42:13.0042455Z 97mcvtqjeyni6sezz4x7bo45v 2025-09-07T10:42:13.0042771Z yxfkcshumo2crprgk1h8r1ue7 2025-09-07T10:42:13.0044717Z vdhrbq2b57cmcr52q78jso7jl 2025-09-07T10:42:13.0045025Z 0avtkvsj3oht6lonr74gj06pk 2025-09-07T10:42:13.0045706Z 0wq2j99p86wsg95l45ybny6ta 2025-09-07T10:42:13.0046008Z rywhce67y9j3z3ehr9256u4vg 2025-09-07T10:42:13.0046320Z lv7voan1m6fflcd9492c18hdf 2025-09-07T10:42:13.0046620Z jnmbni63c7fst6et3ivz8u0b5 2025-09-07T10:42:13.0046919Z vveu4bfktrgk7yvd999gsp53b 2025-09-07T10:42:13.0047223Z xoh05494k65eq11li7hso8tct 2025-09-07T10:42:13.0047502Z y6arhjl1tzfxom3u02y3ttf92 2025-09-07T10:42:13.0047801Z zhzat6y3h1fqom8x1up4zqzqf 2025-09-07T10:42:13.0048085Z qdtv3oa8m9a54vk5dh2t6uvkg 2025-09-07T10:42:13.0048389Z s1xuk91ff8ubs0vlfbljzkvlc 2025-09-07T10:42:13.0048675Z azc4tyjegrdxgzly8bc3yxqv3 2025-09-07T10:42:13.0049367Z ylewv7rvgxz0c0cv6mhdfsqud 2025-09-07T10:42:13.0049662Z wgsdqtwadw1hxe10iwvhiky0u 2025-09-07T10:42:13.0049974Z 3atga2gf92mpxxic6b44f49tv 2025-09-07T10:42:13.0050277Z vcgq60f81l41q730w89tifuv3 2025-09-07T10:42:13.0050567Z w2hd2tdfpg409qkn3wapmcc0q 2025-09-07T10:42:13.0050875Z m1e31lgbyqui2ljr7v4p5a4je 2025-09-07T10:42:13.0051248Z mte111x5krfw31i7erc91grnh 2025-09-07T10:42:13.0051570Z tc87pzl37mx8b8xknxznzb5q3 2025-09-07T10:42:13.0051862Z muj4x6w40t9g50okrhjhb7ask 2025-09-07T10:42:13.0052197Z l6a4zwbu64lu21ent2z1unv5r 2025-09-07T10:42:13.0052503Z vbp6x4didnljrea0lky3ribis 2025-09-07T10:42:13.0052797Z jseydkwoiilh001a3jwnq5fzl 2025-09-07T10:42:13.0053108Z 5jgr9577btfk6flb9ywgx9c09 2025-09-07T10:42:13.0053397Z zo0bxpd5739zgdm1l2mydved6 2025-09-07T10:42:13.0053702Z u2lhojugrjh5mjekm67pzoqvo 2025-09-07T10:42:13.0053993Z wxef22076dx4uw2jzqs63ervw 2025-09-07T10:42:13.0054301Z fpgkm866aqa4cyark2drsnxuh 2025-09-07T10:42:13.0054768Z z0ug8h5ygfn9skk0np4mvkpme 2025-09-07T10:42:13.0055080Z 92itawdp5p0prpjp56y5gqhvm 2025-09-07T10:42:13.0055377Z jtnsi1imp8kalhd79ch5jycjj 2025-09-07T10:42:13.0055681Z lwp98b03sp8bs31h6w76byamc 2025-09-07T10:42:13.0055972Z 71uw54ixan502j61xtupgiiyo 2025-09-07T10:42:13.0056276Z 9nro2cyd7f1cuu9hep19r6m1o 2025-09-07T10:42:13.0056585Z rivsfl2xzoj7cyfq5usvud76q 2025-09-07T10:42:13.0056890Z paz0j1cg9jrtx05bfe0rlb2ad 2025-09-07T10:42:13.0057200Z vjb88bua02xug0skfflt5x1t9 2025-09-07T10:42:13.0057498Z 7797lcezpg32sxy3puauwbll1 2025-09-07T10:42:13.0057802Z 87lh8eyxfwy6erqxe87mnvrm7 2025-09-07T10:42:13.0058094Z xgv1ef193x9t9kuf3u1shrx3r 2025-09-07T10:42:13.0058399Z hxozew3rw8xmvgzscxl433bsj 2025-09-07T10:42:13.0058761Z qp5jm76x0vuww90s132zyocsm 2025-09-07T10:42:13.0059068Z m2f9zg5b1waorlo6wmvpu0qk3 2025-09-07T10:42:13.0059386Z okljxv9kzkuvxj44aoafmmfnf 2025-09-07T10:42:13.0059701Z zswa1fpoqi7wcyzhh3mjckqk0 2025-09-07T10:42:13.0060000Z v6c2a0bzfy9do62c79qzgd61a 2025-09-07T10:42:13.0060306Z kr7xr7mrmx3vitcnmjg804m8h 2025-09-07T10:42:13.0060607Z v2wg9wmp4gjy5jz6vyxfxcdhz 2025-09-07T10:42:13.0060918Z b7wnh5sm6h3dla3aru9p6482l 2025-09-07T10:42:13.0061214Z uodnnosu7hs876b76z1kgvvqe 2025-09-07T10:42:13.0061526Z pyyqauza0aqf9jokkzneucsv0 2025-09-07T10:42:13.0061826Z l5k8nh6467j9clnap95hgadk6 2025-09-07T10:42:13.0062132Z 6x5fobix21lmf4fo44ec1z98t 2025-09-07T10:42:13.0062430Z omujrf7plwjhur8u42po1ypx7 2025-09-07T10:42:13.0062865Z p5pkx52ekbiemu0bk8s5wr9j3 2025-09-07T10:42:13.0063167Z 3diq42831y1nni3mvopckhru5 2025-09-07T10:42:13.0063458Z x12pdvrpq0x41e5gj22c8u9b1 2025-09-07T10:42:13.0063760Z ntvap1wu81u0xf50hpb0l13re 2025-09-07T10:42:13.0064048Z 2mn8mnwsysbs1l39fg44i77hn 2025-09-07T10:42:13.0064345Z l6x4i3x1yqlfo6pqf74j53ymp 2025-09-07T10:42:13.0064638Z aiul0dn8crolxgq3iqie111ay 2025-09-07T10:42:13.0064942Z ai4wo1nchguxfkwvebgn5zvpq 2025-09-07T10:42:13.0065229Z jk7l01ry7bxprbmiw814exhbv 2025-09-07T10:42:13.0065529Z bqc6rfjy1j0sujbth2fd0kqye 2025-09-07T10:42:13.0065815Z egwhv9iyg5rzeip3ryqwa4l9e 2025-09-07T10:42:13.0066116Z ehkqubfaonfsv7383qw9vu7n6 2025-09-07T10:42:13.0066407Z nkbmo9ounpj5jsorviczzgntt 2025-09-07T10:42:13.0066710Z sfdwbkenps24kmm497w6y6vz2 2025-09-07T10:42:13.0067010Z ss9bjxy6htvkjgi2xvtu6n6uy 2025-09-07T10:42:13.0067292Z juq1zy4i3k7f6n2uc7atho6se 2025-09-07T10:42:13.0067591Z pjpvlazoch51qh89qlr4rg1cv 2025-09-07T10:42:13.0067873Z 7x943t0jor5k5l36jde7lmqem 2025-09-07T10:42:13.0068167Z pdd9gpkvablcdm3xizhuw3uzf 2025-09-07T10:42:13.0068460Z 4ti3eo8jshpzir0pjaivc7gw8 2025-09-07T10:42:13.0068890Z kiu8zkgyn22nvh6ytnw4igrkf 2025-09-07T10:42:13.0069080Z 2025-09-07T10:42:13.0069195Z Total reclaimed space: 65.97GB 2025-09-07T10:42:13.0161304Z Post job cleanup. 2025-09-07T10:42:13.0219922Z Post job cleanup. 2025-09-07T10:42:13.1242972Z [command]/usr/bin/git version 2025-09-07T10:42:13.1285241Z git version 2.47.1 2025-09-07T10:42:13.1322867Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/d3c83e84-3523-47be-8a49-02687010eadf/.gitconfig' 2025-09-07T10:42:13.1333430Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/d3c83e84-3523-47be-8a49-02687010eadf' before making global git config changes 2025-09-07T10:42:13.1334614Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T10:42:13.1338769Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-09-07T10:42:13.1380937Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T10:42:13.1418897Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T10:42:13.1772415Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T10:42:13.1794043Z http.https://github.com/.extraheader 2025-09-07T10:42:13.1803219Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-09-07T10:42:13.1834416Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T10:42:13.2261793Z A job completed hook has been configured by the self-hosted runner administrator 2025-09-07T10:42:13.2289489Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-09-07T10:42:13.2295608Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:42:13.2296061Z ##[endgroup] 2025-09-07T10:42:13.2392676Z [!ALERT!] Swap in detected! [!ALERT!] 2025-09-07T10:42:24.6681903Z [!ALERT!] Swap out detected [!ALERT!] 2025-09-07T10:42:43.4544510Z Cleaning up orphan processes